Code: Select all
for (c = 0; c < sampleFrames * NUMBEROFCHANNELS; c++)
{
samplepointer = inputs[c / sampleFrames] + (c % sampleFrames);
//processing
}
It's only slightly useful, but I thought I would share it.
Code: Select all
for (c = 0; c < sampleFrames * NUMBEROFCHANNELS; c++)
{
samplepointer = inputs[c / sampleFrames] + (c % sampleFrames);
//processing
}
Code: Select all
for(int i = 0; i < channels; ++i)
{
input = inputs[i];
for(int j = 0; j < nFrames; ++j)
{
//process input[i]
}
}
It's almost certainly a lot slower in terms of CPU cycles. According to Agner, latencies for integer division 32-bit (to pick some random examples): Core 2 Wolfdale 14-23 cycles, Ivy i7 19-27 (sandy is faster at 11-18) .. and the number of most AMD CPUs appear to be even more depressing.. but you get the idea: integer division is slow, while the extra loop will typically cost you almost nothing.camsr wrote:It's just interesting to use arithmetic to control stuff, even if it's bad practice. It's probably not any faster, but when I tested it, it wasn't any slower. The compiler probably did something smart to it. I'm not using it
Integer division and modulo are calculate by the same opcode, DIV for unsigned, IDIV for signed: the dividend is specified (double wide) in EDX:EAX (and you take a few extra cycles of penalty if your dividing signed and the compiler needs to sign-extend here) and the quotinent is returned in EAX and remainder in EDX.camsr wrote:It's surely open to interpretation. It may be possible to replace the division, but the div's behavior is very dependable in combination with modulo.
+1mystran wrote:It's almost certainly a lot slower in terms of CPU cycles. According to Agner, latencies for integer division 32-bit (to pick some random examples): Core 2 Wolfdale 14-23 cycles, Ivy i7 19-27 (sandy is faster at 11-18) .. and the number of most AMD CPUs appear to be even more depressing.. but you get the idea: integer division is slow, while the extra loop will typically cost you almost nothing.camsr wrote:It's just interesting to use arithmetic to control stuff, even if it's bad practice. It's probably not any faster, but when I tested it, it wasn't any slower. The compiler probably did something smart to it. I'm not using it
Whether you can actually measure this difference in such a simple piece of code is another thing, since your processing time is probably dominated by something else.
See my previous reply above.camsr wrote:In the 2 channel case, it would be possible to replace the idiv with a cmp.
Or does modulo require a div?
Yes it's the same opcode.mystran wrote:See my previous reply above.camsr wrote:In the 2 channel case, it would be possible to replace the idiv with a cmp.
Or does modulo require a div?
Typically you would expect to avoid approximately one misprediction per channel (namely when the inner loops exit), with worst case probably around twice that.camsr wrote:Yes it's the same opcode.mystran wrote:See my previous reply above.camsr wrote:In the 2 channel case, it would be possible to replace the idiv with a cmp.
Or does modulo require a div?
But the idea is still sound
Consider how many branch mispredictions could be avoided.
© KVR Audio, Inc. 2000-2024
Submit: News, Plugins, Hosts & Apps | Advertise @ KVR | Developer Account | About KVR / Contact Us | Privacy Statement