KVR Audio

camsr · Post by **camsr** » Tue Jul 15, 2014 12:46 pm

So maybe this is better?

unsigned div, mod = 0;

for (c = 0; c < sampleFrames * 2; c++) // 2 is for stereo
{
    div = (c >= sampleFrames);  // only works for 2 channels
    mod = c - (div * sampleFrames);
    samplepointer = inputs[div] + (mod);

    //processing

}

Miles1981 · Post by **Miles1981** » Tue Jul 15, 2014 1:07 pm

yes, but you loose readibility, scalability and you have an integer multiplication. Abd you might also have an additional memory delay as the compiler can't know that div is almost always constant.
So it is still a nice thinking exercise, but the best solution (=robust, scalable, readable, efficient...) is still the double for loop.

camsr · Post by **camsr** » Tue Jul 15, 2014 1:35 pm

Miles1981 wrote:And you might also have an additional memory delay as the compiler can't know that div is almost always constant.
So it is still a nice thinking exercise, but the best solution (=robust, scalable, readable, efficient...) is still the double for loop.

This is what I am not sure about. I know that Indirect jumps are always a miss, but array index arithmetic? Why is increment any better?
It should be noted that the target platform is modern, and slightly older, x86-64 CPUs. I use a Core 2 Quad, so I am not thinking below that for selfish reasons.

Miles1981 · Post by **Miles1981** » Tue Jul 15, 2014 1:56 pm

I have to say, I use index aruthmetic. But the two-loop way is more predictable. The compiler knows it can attempt things like vectorization because they follow each other. It might be able to do so with your loop, but it has to think more, meaning it may miss other opportunities.

camsr · Post by **camsr** » Tue Jul 15, 2014 2:35 pm

I don't think GCC is going to optimize it properly either. I think I will try my hand at some ASM for the first time. I don't even know where to start learning to use ASM in source for GCC, it's so much different than MSVC. Anyone know some book or tutorials or some well commented examples?

Miles1981 · Post by **Miles1981** » Tue Jul 15, 2014 2:42 pm

Basic rule of asm today: don't, the compiler is smarter than you are!
If you really want to tackle this, I guess any up to date book would be ennough (like the Irvine one)

camsr · Post by **camsr** » Tue Jul 15, 2014 2:57 pm

...not when the compiler optimizations break up code and controlling optimizations on the function level wreaks!

mystran · Post by **mystran** » Wed Jul 16, 2014 12:42 pm

camsr wrote:
Miles1981 wrote:And you might also have an additional memory delay as the compiler can't know that div is almost always constant.
So it is still a nice thinking exercise, but the best solution (=robust, scalable, readable, efficient...) is still the double for loop.
This is what I am not sure about. I know that Indirect jumps are always a miss, but array index arithmetic? Why is increment any better?

Indirect jumps are actually predicted by modern processors if you take them repeatedly. Some processors can even predict some patterns, but jumping to the same address multiple times works fairly reliably as long as the jump is still in the branch cache (and goes to the same address as the last time). [edit: function return addresses additionally have a separate mechanism, so returns will generally predict perfectly as long as they pair with calls correctly]

Array index arithmetic on the other hand can actually introduce a pipeline stall (assuming out-of-order execution can't find anything else to do), because the CPU can't necessarily schedule the memory access before your arithmetic is retired. Typically that's just a few cycles though, unless you're doing things like converting floats to integers to array indexes.

It should be noted that the target platform is modern, and slightly older, x86-64 CPUs. I use a Core 2 Quad, so I am not thinking below that for selfish reasons.

Well, given the target platform, your basic rule of thumb should be: branches (direct or indirect) that go the same way multiple times in a row are basically free and anything else you try to do (with the possible exception of unrolling some loops with short body.. but note that your compiler might do this for you anyway) is going to just slow things down.

Still, by far the best optimization approach these days is to try to write your code as clearly as possible. The more you try to do tricky old-school hacks, the higher the chance that your compiler has to give up on some optimizations, because it fails to figure out whether or not those optimizations are safe. Some unsafe strength reductions (eg converting float division by constant to multiplication by reciprocal) can be worth-while, but beyond that it's usually futile, unless you are willing to invest a lot of time in properly profiling it on multiple systems.

A way of re-using a loop without nesting