KVR Audio

PurpleSunray · Post by **PurpleSunray** » Wed Oct 17, 2018 8:33 pm

@Nowhk
The the local copy of your envelope data again is on RAM (on stack). So the memcpy copies from RAM to RAM - pretty useless.
You gain performance from a "local copy" if that value fits into a CPU register. Than the compiler might decide to load it from RAM once and keep on the register during loop runs.

Nowhk · Post by **Nowhk** » Wed Oct 17, 2018 8:40 pm

PurpleSunray wrote: ↑Wed Oct 17, 2018 8:33 pm @Nowhk
The the local copy of your envelope data again is on RAM (on stack). So the memcpy copies from RAM to RAM - pretty useless.
You gain performance from a "local copy" if that value fits into a CPU register. Than the compiler might decide to load it from RAM once and keep on the register during loop runs.

How can I ensure data I need (for that loop) goes to CPU register during loop runs so?

2DaT · Post by **2DaT** » Wed Oct 17, 2018 8:49 pm

PurpleSunray wrote: ↑Wed Oct 17, 2018 8:20 pm
2DaT wrote: ↑Wed Oct 17, 2018 4:15 pm
PurpleSunray wrote: ↑Wed Oct 17, 2018 1:50 pm Why would you do all of this integer stuff? Your can increment in a SIMD friendly way.
Code: Select all
__m128 i1 = _mm_set_pd(1.0,0.0);
__m128 i2 = _mm_set_pd(3.0,2.0);
__m128 increment = _mm_set1_pd(4.0);
i1 = _mm_add_pd(i1,increment);
i2 = _mm_add_pd(i2,increment);
It has a drawback of a loop carried dependency chain, but it's not a big deal for this loop.
Because of 1 add_epi32 vs 2 add_pd. There is no point calculating doubles if you only need int, is there?

Your code also contains these instructions:
_mm_set1_epi32 which is a cross-domain mov and shuffle.
_mm_cvtepi32_pd * 2 which is roughly equivalent of add but sometimes worse.
_mm_srli_si128 - a shift or shuffle.

PurpleSunray · Post by **PurpleSunray** » Wed Oct 17, 2018 9:11 pm

@2Dat OK
@Nowhk you can't.
The compiler will decide. All you can do is making it easy for it. You can do so by knowing your CPU (your CPU will have 16 XMM registers to play with, most likley).
You did so already, by storing some values on a local double instead of reading it from the struct. So the compiler can build code to read it once from RAM and keep it on a XMM register. When you read from struct directly, compiler can't keep it on register, because you want to read it from sturct

Nowhk · Post by **Nowhk** » Thu Oct 18, 2018 7:36 am

PurpleSunray wrote: ↑Wed Oct 17, 2018 9:11 pm You did so already, by storing some values on a local double instead of reading it from the struct. So the compiler can build code to read it once from RAM and keep it on a XMM register. When you read from struct directly, compiler can't keep it on register, because you want to read it from sturct

Exactly

I place most data on local register, so it will use them instead of accessing the RAM.
That's why "later" I memcopy: I'm filling a local array, avoid cpu to access RAM. But once I've filled it (at the end of the for) I need to copy back to the array in ram, else I'll lost the calculations

Why you said the local copy of your envelope data again is on RAM (on stack)?
I miss this point, since (as you already said) I've "storing some values on a local double instead of reading it from the struct"

BertKoor · Post by **BertKoor** » Thu Oct 18, 2018 7:56 am

http://ithare.com/infographics-operatio ... ck-cycles/

metamorphosis · Post by **metamorphosis** » Thu Oct 18, 2018 8:42 am

While I can't comment on the local/SIMD situation (and it seems like you guys have got that covered), I did some code review of my own on the original and found some redundancies. Also I don't recommend splitting stuff out into a function unless two or more functions are using that code, performance-wise.

Any tips for optimize this code?