oh well, I just need non-conditional min/max functions.S0lo wrote:problem is max and min implementation it self uses conditionals.
Indeed, that was daft of meS0lo wrote:Any way, you can do this:
x = min(x, 1);
Miles1981 wrote:S0lo wrote:Both codes do the same thing. "sorting". But Code1 (insertion sort) uses two conditionals (an if and a while). While code2 (quicksort) has four conditionals (2 ifs and 2 whiles). Yet code2 is well known to perform much faster (nlog(n)) than code1 (quadratic) on the average.
check here: https://en.wikipedia.org/wiki/Quicksort
and here: https://en.wikipedia.org/wiki/Insertion_sort
You are comparing apples and oranges.
Miles1981 wrote:Of course if the complexity is different, this changes everything! We are talking about the same complexity in number of instructions.
I think the point here is that the goal is to find the most efficient algorithm and by extension reduce the complexity. Once we find that then we needn't concern ourselves with details such as conditionals == bad, or whatever.Miles1981 wrote:S0lo wrote:Both codes do the same thing. "sorting". But Code1 (insertion sort) uses two conditionals (an if and a while). While code2 (quicksort) has four conditionals (2 ifs and 2 whiles). Yet code2 is well known to perform much faster (nlog(n)) than code1 (quadratic) on the average.
check here: https://en.wikipedia.org/wiki/Quicksort
and here: https://en.wikipedia.org/wiki/Insertion_sort
You are comparing apples and oranges. Of course if the complexity is different, this changes everything! We are talking about the same complexity in number of instructions.
camsr wrote:Also important to note that, comparisons are cheap, branches are expensive.. if you can do some logic around comparison results without branching it could be the best option.
Nowhk wrote:MadBrain wrote:- Process voice by voice instead of sample by sample! CPUs like this (even though you have to do an extra memset at the start).camsr wrote:Yes. There's no faster memory than the cacheline, so your efforts should revolve around using it as much as possible.
I guess the nice MadBrain's suggestion above is to "improve" that cacheline. But why it should? I don't see why CPU is "faster" on processing sample-block within a voice (and mix it, finally) instead of processing voices within a sampleIt has to do the same amount of iteration...
Nowhk wrote:Also: why do you use memset? What's the benefit here? From ProcessDoubleReplacing, outputLeft and outputRight (i.e. double **inputs, double **outputs) are already allocated (I guess).
Many thanks guys, I'm learning aalllottttttt!!!!!
for(int i=0; i<nbSamples; i++)
outputLeft[i] = 0.0;
for(int i=0; i<nbSamples; i++)
outputRight[i] = 0.0;
MadBrain wrote:Memset just fills the buffer with 0.0. It's quivalent to doing this:
- Code: Select all
for(int i=0; i<nbSamples; i++)
outputLeft[i] = 0.0;
for(int i=0; i<nbSamples; i++)
outputRight[i] = 0.0;
The advantage is that the standard library memset() generally has a fast assembly-optimized version.
MadBrain wrote:- It takes pMidiHandler->Process(); and pVoiceManager->Process(); out of the per-sample loop, which reduces the amount of code the instruction cache "sees" in the sample-per-sample section if those functions are large, making it more likely that it fits in L1 instruction cache or micro-op cache. This also applies if you reduce how often your modulations are calculated.
MadBrain wrote:- It makes it more likely that your if()s will keep always going the same way in a row (or go in a short pattern that the branch predictor can follow), since you're processing the same voice over and over. For instance for your 160 envelopes, if you mix up the processing for all voices, it's likely that your different voices will be in different states, so the branch predictor will go all over the place. If you process per voice then the branch predictor will only see the 10 same envelopes over and over so the chances that it will predict accurately are higher.
MadBrain wrote:- In the process of generating a sample for a voice, the CPU has to load and store a bunch of state variables from your voice object. If you process per voice, all those loads and stores are going to fall on the same addresses over and over, which makes it quite more likely that it's all in L1 cache, and it's also a better pattern for the prefetcher. If you're reading data from some wavetable, then all your loads are going to cluster together and advance more or less linearly, instead of falling all over the place depending on oscillator phases.
MadBrain wrote:- You'll also probably have more luck getting the compiler to keep some of your values in registers instead of loading/storing them over and over, since you can read them into function local values (though on x86 cutting down on loads/stores is hard!).
MadBrain wrote:- The CPU will not have to calculate the address of each of your voice objects every sample. This is only an addition plus an optimized multiplication per size of your voice object size, but all the loads/stores to your voice object variables depend on it... I don't know how much effect this has on bleeding edge CPUs but I could definitely see the effect on a pre-Ryzen multicore AMD.
Nowhk wrote:Are you saying that is more probable that CPU place a local var directly to a register if the loop-iteration is 100 samples long (i.e. a block size) instead of 16 (the voice iteration)? If so, that's really makes sense...
PurpleSunray wrote:Not really (your biggest issue will be that compiler manages the registers, but the loop-count is not necessarily know already at compile time).
Nowhk wrote:Would you please?
Nowhk wrote:So you mean somethings like "the less variables/functions not needed for every sample in the critical section (per-sample loop), the more space you have to store "main/audio" data within cache and register (reducing cache miss/fault)", right?
Nowhk wrote:Basically, you are isolating the code without branch/if statement (which, for reasons that I don't get yet, will slow the code; but I think its a sort of holy grail, watching S0lo and PurpleSunray fight) within the per-sample loop (i.e. critical section). Have I got your words?
Nowhk wrote:What does it means "all those loads and stores are going to fall on the same addresses over and over"? When I switch to another voice, I don't care if the new Voice object load/store its data to the previous voice's addresses. Why it should care?
Nowhk wrote:Are you saying that is more probable that CPU place a local var directly to a register if the loop-iteration is 100 samples long (i.e. a block size) instead of 16 (the voice iteration)? If so, that's really makes sense...
uint32_t phase = voice.m_phase;
uint32_t freq = voice.m_freq;
int16_t *wave = voice.m_wavePtr;
while(samplesLeft > 0) {
*output++ += wave[phase >> 20];
phase += freq;
samplesLeft--;
}
voice.m_phase = phase;
MadBrain wrote:The compiler will likely keep phase, freq, wave, i, output and samplesLeft in registers,
void MainIPlug::ProcessDoubleReplacing(double **inputs, double **outputs, int nFrames) {
double *outputLeft = outputs[0];
double *outputRight = outputs[1];
memset(outputLeft, 0, nFrames * sizeof(double));
memset(outputRight, 0, nFrames * sizeof(double));
// buffer
int samplesLeft = nFrames;
while (samplesLeft > 0) {
// events
pMidiHandler->Process();
pVoiceManager->Process();
int blockSize = samplesLeft;
blockSize = pMidiHandler->GetSamplesTillNextEvent(blockSize);
// voices
for (int voiceIndex = 0; voiceIndex < PLUG_VOICES_BUFFER_SIZE; voiceIndex++) {
Voice &voice = pVoiceManager->mVoices[voiceIndex];
if (!voice.mIsPlaying) {
continue;
}
int nbSamples = blockSize;
while (nbSamples > 0) {
for (int envelopeIndex = 0; envelopeIndex < ENVELOPES_CONTAINER_NUM_ENVELOPE_MANAGER; envelopeIndex++) {
Envelope &envelope = pEnvelopesContainer->pEnvelopeManager[envelopeIndex]->mEnvelope;
// new voice restart envelope
if (voice.mSample == 0.0) {
envelope.Reset(voice);
}
envelope.Process(voice);
}
nbSamples--;
}
}
pMidiHandler->Flush(blockSize);
pVoiceManager->Flush(blockSize);
samplesLeft -= blockSize;
outputLeft += blockSize;
outputRight += blockSize;
}
}
void Envelope::Reset(Voice &voice) {
mVoiceParameters[voice.mIndex].mControlRateIndex = 0;
mVoiceParameters[voice.mIndex].mBlockStep = gBlockSize;
mVoiceParameters[voice.mIndex].mStep = 0.0;
mVoiceParameters[voice.mIndex].mValue = mAmps[0];
}
void Envelope::Process(Voice &voice) {
VoiceParameters &voiceParameters = mVoiceParameters[voice.mIndex];
// control rate
if (voiceParameters.mControlRateIndex-- == 0) {
voiceParameters.mControlRateIndex = PLUG_CONTROL_RATE - 1;
if (mIsEnabled) {
// refresh at block size
if (voiceParameters.mBlockStep >= gBlockSize) {
// loop
unsigned int sectionIndex = RefreshSectionIndex(voiceParameters.mStep);
double sectionStep = RefreshSectionStep(sectionIndex, voiceParameters.mStep);
unsigned int blockIndex = RefreshBlockIndex(sectionStep);
double sectionLength = RefreshSectionLength(sectionIndex);
int numBlocks = (int)(sectionLength / gBlockSize);
double numBlocksFraction = 1.0 / numBlocks;
double pos0 = blockIndex * numBlocksFraction;
double pos1 = (blockIndex + 1) * numBlocksFraction;
double a = 1.0 - (1.0 / mTensions[sectionIndex]);
double p0 = pos0 / (pos0 + a * (pos0 - 1.0));
double p1 = pos1 / (pos1 + a * (pos1 - 1.0));
double sectionStartAmp = mAmps[sectionIndex];
double sectionEndAmp = mAmps[sectionIndex + 1];
double sectionDeltaAmp = sectionEndAmp - sectionStartAmp;
voiceParameters.mBlockStartAmp = sectionStartAmp + p0 * sectionDeltaAmp;
double blockEndAmp = sectionStartAmp + p1 * sectionDeltaAmp;
voiceParameters.mBlockFraction = (blockEndAmp - voiceParameters.mBlockStartAmp) * (1.0 / gBlockSize);
voiceParameters.mBlockStep = fmod(voiceParameters.mBlockStep, gBlockSize);
}
// update value
voiceParameters.mValue = (voiceParameters.mBlockStartAmp + (voiceParameters.mBlockStep * voiceParameters.mBlockFraction));
ScaleValue(voiceParameters.mValue);
}
else {
// update value
voiceParameters.mValue = 0.0;
}
}
// next phase
voiceParameters.mBlockStep += mRate;
voiceParameters.mStep += mRate;
}
Moderator: Moderators (Main)
© KVR Audio, Inc. 2000-2018
Submit: News, Plug-ins, Hosts & Apps | Advertise @ KVR | About KVR / Contact Us | Privacy Statement