KVR Audio

camsr · Post by **camsr** » Fri Jul 11, 2014 8:25 am

So I was experimenting with using only one loop to do all the channels' processing, as opposed to two loops one after the next (stereo). I thought to take the processReplacing buffer pointer parameter for the first input buffer and simply iterate the one loop past the first buffer into the next, via sampleFrames * 2. I noticed this worked in Reaper, but it did not work in FL studio. The host will determine how these buffers are layed out, so I came up with this to handle the case that the buffers are not contiguous.

Code: Select all

for (c = 0; c < sampleFrames * NUMBEROFCHANNELS; c++)
{

    samplepointer = inputs[c / sampleFrames] + (c % sampleFrames);

    //processing

}

In this way, as long as each buffer is exactly the length of sampleFrames, the number of channels can change between blocks, and a simple algorithm can keep cooking no matter the memory layout.

It's only slightly useful, but I thought I would share it.

Miles1981 · Post by **Miles1981** » Fri Jul 11, 2014 8:47 am

Only works if the buffers are all contiguous (hint: you cannot assume they are, even if for verion x.y.z of a DAW they are), and only for loops that don't have memory.

camsr · Post by **camsr** » Fri Jul 11, 2014 9:30 am

The array inputs[] is just a list of pointers to the actual buffers, which are not guaranteed to be contiguous. That pointer array is is what holds it together. All the buffers have to be the same size.

Miles1981 · Post by **Miles1981** » Fri Jul 11, 2014 9:47 am

OK, sorry...
The issue is that you have one integer division and one modulo, just to avoid one loop. I would do this instead:

Code: Select all

for(int i = 0; i < channels; ++i)
{
  input = inputs[i];
  for(int j = 0; j < nFrames; ++j)
  {
    //process input[i]
  }
}

As process input is usually another call, it doesn't add any complexity, and you avoid those two "long" operations. I can't see why your option can seem appealing, but I don't think it is worth it.

edit: I also forgot one thing: with two loops, you know how the buffer is processed, with one, you have to think twice.

camsr · Post by **camsr** » Fri Jul 11, 2014 10:07 am

It's just interesting to use arithmetic to control stuff, even if it's bad practice. It's probably not any faster, but when I tested it, it wasn't any slower. The compiler probably did something smart to it. I'm not using it

One thing that strikes me as odd about the VST spec is that the plugin defines the amount of channels to be used. Most plugins can easily re-use code on all channels for their purpose, master volume control for instance

or even a clipper or a limiter. The loop can use memory but it has to be addressed with arithmetic I think, I haven't given it much thought. If the list is sorted to only have the required channels, then as long as the number of channels is passed in, it could process those channels arbitrarily.

Miles1981 · Post by **Miles1981** » Fri Jul 11, 2014 10:47 am

I think that a more readable way is to have a loop that passes the array to the function (which will loop on all samples on a fiven array)

Having a division and a modulo can only be slower that the two for loops and it is also less maintanable as it is less obvious.
Of course, this is my opinion, and it is similar to the way I implemented my library.

mystran · Post by **mystran** » Fri Jul 11, 2014 6:22 pm

camsr wrote:It's just interesting to use arithmetic to control stuff, even if it's bad practice. It's probably not any faster, but when I tested it, it wasn't any slower. The compiler probably did something smart to it. I'm not using it

It's almost certainly a lot slower in terms of CPU cycles. According to Agner, latencies for integer division 32-bit (to pick some random examples): Core 2 Wolfdale 14-23 cycles, Ivy i7 19-27 (sandy is faster at 11-18) .. and the number of most AMD CPUs appear to be even more depressing.. but you get the idea: integer division is slow, while the extra loop will typically cost you almost nothing.

Whether you can actually measure this difference in such a simple piece of code is another thing, since your processing time is probably dominated by something else.

camsr · Post by **camsr** » Fri Jul 11, 2014 8:06 pm

It's surely open to interpretation. It may be possible to replace the division, but the div's behavior is very dependable in combination with modulo.

mystran · Post by **mystran** » Fri Jul 11, 2014 8:31 pm

camsr wrote:It's surely open to interpretation. It may be possible to replace the division, but the div's behavior is very dependable in combination with modulo.

Integer division and modulo are calculate by the same opcode, DIV for unsigned, IDIV for signed: the dividend is specified (double wide) in EDX:EAX (and you take a few extra cycles of penalty if your dividing signed and the compiler needs to sign-extend here) and the quotinent is returned in EAX and remainder in EDX.

Anyway, if your division and modulo are the same and you're lucky and your compiler notices this, you might only take the penalty once.

Zaphod (giancarlo) · Post by **Zaphod (giancarlo)** » Sat Jul 12, 2014 7:27 am

mystran wrote:
camsr wrote:It's just interesting to use arithmetic to control stuff, even if it's bad practice. It's probably not any faster, but when I tested it, it wasn't any slower. The compiler probably did something smart to it. I'm not using it
It's almost certainly a lot slower in terms of CPU cycles. According to Agner, latencies for integer division 32-bit (to pick some random examples): Core 2 Wolfdale 14-23 cycles, Ivy i7 19-27 (sandy is faster at 11-18) .. and the number of most AMD CPUs appear to be even more depressing.. but you get the idea: integer division is slow, while the extra loop will typically cost you almost nothing.

Whether you can actually measure this difference in such a simple piece of code is another thing, since your processing time is probably dominated by something else.

+1
slighly slower, neglegible in the whole picture, ugly code
New issues, no gain

camsr · Post by **camsr** » Sun Jul 13, 2014 11:01 pm

In the 2 channel case, it would be possible to replace the idiv with a cmp.

Or does modulo require a div?

mystran · Post by **mystran** » Mon Jul 14, 2014 10:58 am

camsr wrote:In the 2 channel case, it would be possible to replace the idiv with a cmp.
Or does modulo require a div?

See my previous reply above.

camsr · Post by **camsr** » Mon Jul 14, 2014 9:32 pm

mystran wrote:
camsr wrote:In the 2 channel case, it would be possible to replace the idiv with a cmp.
Or does modulo require a div?
See my previous reply above.

Yes it's the same opcode.
But the idea is still sound

Consider how many branch mispredictions could be avoided.

mystran · Post by **mystran** » Tue Jul 15, 2014 7:28 am

camsr wrote:
mystran wrote:
camsr wrote:In the 2 channel case, it would be possible to replace the idiv with a cmp.
Or does modulo require a div?
See my previous reply above.
Yes it's the same opcode.
But the idea is still sound

Consider how many branch mispredictions could be avoided.

Typically you would expect to avoid approximately one misprediction per channel (namely when the inner loops exit), with worst case probably around twice that.

Miles1981 · Post by **Miles1981** » Tue Jul 15, 2014 8:51 am

Which far less time consuming than the div/modulo thingy.
And also readability beats purity...

A way of re-using a loop without nesting