lol did i actually type it like that or did you change it?xoxos wrote:best thread title ever.
how to generate random noise without making function calls?
-
- KVRian
- Topic Starter
- 940 posts since 11 Mar, 2001 from nyc
-
- KVRian
- Topic Starter
- 940 posts since 11 Mar, 2001 from nyc
Aleksey Vaneev wrote:It's important to understand that (to my knowledge) no compiler will be able to 'optimize' accesses to such function arguments as pointer to structure or reference to structure:
Code: Select all
inline void process( CChannel& Data, int l, double* p )
{
while( l > 0 )
{
Data.y += ( *p - Data.y ) * Data.c;
*p = Data.y;
p++;
l--;
}
}
The problem is, compiler does not know for sure where it should or should not put the intermediate values into the structure.
To get an optimized code you should write this way: (this code can be unrolled by a compiler without a problem)
Code: Select all
inline void process( CChannel& Data, int l, double* p )
{
const double c = Data.c;
double y = Data.y;
while( l > 0 )
{
y += ( *p - y ) * c;
*p = y;
p++;
l--;
}
Data.y = y;
}
- KVRAF
- 6478 posts since 16 Dec, 2002
I meant that if you start avoiding function calls for arbitrary reasons you will end up with obfuscated code that no one will ever be able understand let alone fix, and it won't be any faster in the end either.Chris Walton wrote:(2) How in the world does block processing hinder object oriented design?
I've heard people still unroll function call chains and end up with a large collection of highly specialised non-reusable code. great for those much needed optimisations yielding 0.001% more performance that only work on certain CPU models.
I personally leave that to the obsessive compulsive types.
- KVRAF
- 4030 posts since 7 Sep, 2002
Experience is the answer.laserbeak wrote:wow how do you guys learn about all of this?
To be more specific, in the second example compiler is able to process the whole loop in registers, without even touching memory (errhm, I mean, without touching Data structure). In the first example it may need to access Data structure to put intermediate result (while access to "Data.c" variable may still get optimized by preloading into register). Also, having access to a structure in such loop is at least 1 lost register in the context of the loop.
I've made a simple test with MinGW with -O3 --unroll-loops -no-inline settings.
Here's the typical unrolled output of the first variant (unrolled 4 times, typical part in ***):
Code: Select all
L7:
fldl (%ecx)
subl $4, %eax
fldl (%edx)
fsub %st(1), %st
fmull 8(%ecx)
faddp %st, %st(1)
fstl (%ecx)
fstpl (%edx)
***
fldl (%ecx)
fldl 8(%edx)
fsub %st(1), %st
fmull 8(%ecx)
faddp %st, %st(1)
fstl (%ecx)
fstpl 8(%edx)
***Code: Select all
L7:
fldl (%edx)
subl $8, %eax
fsub %st(1), %st
fmul %st(2), %st
faddp %st, %st(1)
***
fldl 8(%edx)
fsub %st(1), %st
fxch %st(1)
fstl (%edx)
fxch %st(1)
fmul %st(2), %st
faddp %st, %st(1)
***
Last edited by Aleksey Vaneev on Sat Nov 10, 2007 6:28 pm, edited 5 times in total.
- KVRAF
- 2187 posts since 25 Jan, 2007 from the back room, away from his wife's sight (or so he thinks)
Ah, I see, and I fully agreeKingston wrote:I meant that if you start avoiding function calls for arbitrary reasons you will end up with obfuscated code that no one will ever be able understand let alone fix, and it won't be any faster in the end either.Chris Walton wrote:(2) How in the world does block processing hinder object oriented design?
I've heard people still unroll function call chains and end up with a large collection of highly specialised non-reusable code. great for those much needed optimisations yielding 0.001% more performance that only work on certain CPU models.
I personally leave that to the obsessive compulsive types.
Cakewalk by Bandlab / FL Studio
Squire Stratocaster / Chapman ML3 Modern V2 / Fender Precision Bass
Formerly known as arke, VladimirDimitrievich, bslf, and ctmg. Yep, those bans were deserved.
Squire Stratocaster / Chapman ML3 Modern V2 / Fender Precision Bass
Formerly known as arke, VladimirDimitrievich, bslf, and ctmg. Yep, those bans were deserved.
-
- KVRian
- 1002 posts since 1 Dec, 2004
The penality for a function call inside the processing loop isn't too bad. This might happen in optimizing compiler if the loop is long (in particular for repeated calls to a same function, so that the loop sits inside the cache instead of being too large), even if you add "inline".
-
- KVRian
- Topic Starter
- 940 posts since 11 Mar, 2001 from nyc
Aleksey Vaneev wrote:Experience is the answer.laserbeak wrote:wow how do you guys learn about all of this?Really, in the two code examples I've posted the second one produces best performance, and I've described the reason.
To be more specific, in the second example compiler is able to process the whole loop in registers, without even touching memory (errhm, I mean, without touching Data structure). In the first example it may need to access Data structure to put intermediate result (while access to "Data.c" variable may still get optimized by preloading into register). Also, having access to a structure in such loop is at least 1 lost register in the context of the loop.
I've made a simple test with MinGW with -O3 --unroll-loops -no-inline settings.
Here's the typical unrolled output of the first variant (unrolled 4 times, typical part in ***):Second variant (unrolled 8 times, typical part in ***):Code: Select all
L7: fldl (%ecx) subl $4, %eax fldl (%edx) fsub %st(1), %st fmull 8(%ecx) faddp %st, %st(1) fstl (%ecx) fstpl (%edx) *** fldl (%ecx) fldl 8(%edx) fsub %st(1), %st fmull 8(%ecx) faddp %st, %st(1) fstl (%ecx) fstpl 8(%edx) ***As you see, much less number of memory accesses in the second variant.Code: Select all
L7: fldl (%edx) subl $8, %eax fsub %st(1), %st fmul %st(2), %st faddp %st, %st(1) *** fldl 8(%edx) fsub %st(1), %st fxch %st(1) fstl (%edx) fxch %st(1) fmul %st(2), %st faddp %st, %st(1) ***
cool thanks i can see that it's smaller, but would you be what kingston calls "obsessive compulsive"?
-
- KVRist
- 161 posts since 9 Apr, 2002
This also applies to accessing class member variables in class methods (but that should be obvious, as calling a member function is basically a "normal" function call with implicit this pointer passed) - sometimes it's better to preload member variables into local ones before doing lot's of writing into them (reading from memory won't be that bad I think ). It's also worth noting, that this kind of optimization should be done hmmm... carefullyAleksey Vaneev wrote:It's important to understand that (to my knowledge) no compiler will be able to 'optimize' accesses to such function arguments as pointer to structure or reference to structure:
cheers,
Bart
- KVRAF
- 4030 posts since 7 Sep, 2002
Not at all.laserbeak wrote:cool thanks i can see that it's smaller, but would you be what kingston calls "obsessive compulsive"?
However, while it's 'obsessive', but it's also useful to know that at least MinGW won't unroll such construct (I do not know if Intel C++ Compiler would):
Code: Select all
while( l-- > 0 )
{
*(p++) = *p * 5.0;
}Code: Select all
while( l > 0 )
{
*p = *p * 5.0;
p++;
l--;
}-
Leslie Sanford Leslie Sanford https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=131095
- KVRAF
- 1640 posts since 4 Dec, 2006
Efficiency questions aside, I find this to be a better programming style. I don't like embedded expressions within a statement that have side effects. It obscures the intent of the statement and makes it tougher to debug. All in my opinion, of course.Aleksey Vaneev wrote: You should write it this way:Code: Select all
while( l > 0 ) { *p = *p * 5.0; p++; l--; }
- KVRAF
- 4030 posts since 7 Sep, 2002
I think it is always beneficial to pre-load variables that are going to be updated, even if you think not enough registers are available. Of course, I'm talking about block processing loops. If state variable is on stack, compiler can manage it in whatever way it wants. If the state variable belongs to an external structure, compiler's hands are tied. In fact, preloading a variable does not increase overhead, because compiler should usually issue a loading instruction anyway while it can still remove that pre-loading instruction if it's not required. So, pre-loading is merely a guideline to the compiler's optimizer: it's like putting a 'const' keyword near a variable or function that will not be subject to change.FEV wrote:It's also worth noting, that this kind of optimization should be done hmmm... carefully(check asm output) - especially if you're going to preload more variables than you have registers (in that case you may actually decrease performance).
-
- KVRist
- 161 posts since 9 Apr, 2002
I disagree...Aleksey Vaneev wrote:I think it is always beneficial to pre-load variables that are going to be updated, even if you think not enough registers are available.
Same hereAleksey Vaneev wrote: Of course, I'm talking about block processing loops.
I'm only using MSVC++ and "mangle" only SSE (__m128, __m128d) data types inAleksey Vaneev wrote: In fact, preloading a variable does not increase overhead, because compiler should usually issue a loading instruction anyway while it can still remove that pre-loading instruction if it's not required.
block processing loops, and I've often seen a mess generated by the compiler in a preloading part. If there's not enough registers VC++ may preload a member variable into register and immediately save it again into memory (performance hit). Then in processing loop it will read/write into that memory block instead of using register. If the variable is often used in a inner loop, compiler may actually load it again into a register, but that will happen in a loop (and so you will also get per loop iteration memory read and write). In such cases the execution speed might be improved if you are often writing (inside the loop) into such a "badly preloaded" variable (reading does not cost much - at least not with movaps which I'm mostly dealing with) - but that shouldn't take place as writing to a final destination should always happen once per iteration (preferably at it's end - so the compiler can do a better job optimizing the whole loop). And even if you get some improvement in a loop speed you also have to consider how much performance penalty you'll get by pointless preloading of variables (those that didn't fit into registers) before the loop execution.
So like I said - "preloading" is a good idea, but not in all cases. I tend to pick only a couple of variables to preload instead of all of them (the ones I'll be writing to get a priority), and I always check asm output to make sure, that the compiler does exactly what I want
And another thing worth mentioning - inlining functions that do that kind of preloading is also mostly a bad idea, but I see (by looking at the compiler flags you're using) that you already know that
Ps.: I'm not sure if my "bad case scenario" description is quite clear, so if you want I can provide an example along with an assembly output
cheers,
Bart
-
- KVRian
- Topic Starter
- 940 posts since 11 Mar, 2001 from nyc
sure. if you can spare an example.
i could use a bit of asm exposure anyway.
-
- KVRist
- 161 posts since 9 Apr, 2002
Ok. Definitely not the best example, but I'm little after the deadlines and am busy with some other things (I just hope you're not reading this Peterlaserbeak wrote:sure. if you can spare an example.
Here's a processing function: example.cpp (it's a member function of some class whose member variables are prepended with 'm') and what msvc++ 8.0 produced out of it: example.asm
It's not really important what that thing does (well, you can try to guess if you have nothing better to do
Code: Select all
movaps xmm0, XMMWORD PTR [ecx+464]
...
movaps XMMWORD PTR _highFilterPrev$[esp+64], xmm0
movaps xmm0, XMMWORD PTR [ecx+480]
movaps XMMWORD PTR _highFilterLast$[esp+64], xmm0
Still, most of the time it's good to create local copies of member variables (but only those you'll be writing to) - but not always (check asm / profile it to be sure it's a good solution for a given task)
cheers,
Bart
-
- KVRian
- Topic Starter
- 940 posts since 11 Mar, 2001 from nyc
cool i don't know a lot of those opcodes i'm very new to asm. but i do get your point and i've noticed that the asm version has a few things thrown around in a diff order. did the compiler do this or did you?
oh and my guess is, a filter module with an envelope? lol
oh and my guess is, a filter module with an envelope? lol
