Are you using fp:fast for your dsp code?Wolfen666 wrote:I'm curious about that one, could you tell us more about that ?I should also note that I have to use fp:precise for my Juce GUI code
I use all the time fp:fast since some of my code get significant improvements on the CPU load side this way...
Poll: what OS do most people use for music production?
-
- KVRian
- 573 posts since 1 Jan, 2013 from Denmark
-
- KVRian
- 1153 posts since 11 Aug, 2004 from Breuillet, France
- KVRAF
- 3008 posts since 17 Apr, 2010 from Croatia
I think, I'm using too many doubles to see any benefit from going fpfast. ...but then I clearly feel I don't yet know what I'm doing.
-
- KVRian
- 573 posts since 1 Jan, 2013 from Denmark
No, not necessarily. I guess it really depends on your application; fast allows for reordering / algebraic optimizations that may or may not be wanted. Some filters require a certain sequence of operations to achieve numerical stability.Wolfen666 wrote:Yes, most of the time, is it wrong ?
You should see the same linear gains from using doubles and fp:fast (i doubt even aggressive optimizers will change double operations into floats).Taron wrote:I think, I'm using too many doubles to see any benefit from going fpfast. ...but then I clearly feel I don't yet know what I'm doing.
This should be of interest for your both:
http://stackoverflow.com/questions/6430 ... -to-aaaaaa
-
- KVRian
- 1153 posts since 11 Aug, 2004 from Breuillet, France
- KVRAF
- 3008 posts since 17 Apr, 2010 from Croatia
Bewildering statement somehow, haha, but I'll have to investigate a lot more than just that anyway. Right now it's just important that I keep my frequencies stable on the self-oscillating units I have. Floats actually don't hold up it seems. I thought that was fascinating.
Anyway...I'm just worried about denormalization needs. That stuff still puzzles me. I wished there was a compiler option that would automatically take care of that!?
Anyway...I'm just worried about denormalization needs. That stuff still puzzles me. I wished there was a compiler option that would automatically take care of that!?
- KVRAF
- 25852 posts since 20 Jan, 2008 from a star near where you are
With 195 votes, the result is bascially the same, 70% Win users, and 31% OSX.Numanoid wrote:So there is 60 Win users and 26 OSX users
That means there are more than 2 Win users for every OSX user
Interesting to see that the poll is at 101% in total, it is red hot
- KVRAF
- 3008 posts since 17 Apr, 2010 from Croatia
-
- KVRian
- 573 posts since 1 Jan, 2013 from Denmark
Well, the problem is, that it's handled through flags internal to the CPU. In a plugin-like context, any piece of code compiled separately has to enable/disable the flags for them to be reliable. Most people just add some inaudible noise before the processing.Taron wrote:Bewildering statement somehow, haha, but I'll have to investigate a lot more than just that anyway. Right now it's just important that I keep my frequencies stable on the self-oscillating units I have. Floats actually don't hold up it seems. I thought that was fascinating.
Anyway...I'm just worried about denormalization needs. That stuff still puzzles me. I wished there was a compiler option that would automatically take care of that!?
-
- KVRian
- 1379 posts since 26 Apr, 2004 from UK
fpfast is mainly about not complying with the IEEE norm, isn't it? And it helps also vectorising, as the compiler is free to use operation that are not as precise as the others.
- KVRAF
- 3426 posts since 15 Nov, 2006 from Pacific NW
This was purely based on trial and error. When I compiled my whole project with fp:fast on Windows, the GUI code acted weird. I don't remember exactly HOW it got weird, but it was definitely buggy. This isn't any particularly advanced GUI code, just vector graphics in Juce.Wolfen666 wrote:I'm curious about that one, could you tell us more about that ?I should also note that I have to use fp:precise for my Juce GUI code
I use all the time fp:fast since some of my code get significant improvements on the CPU load side this way...
On OSX, I had some issues with VintageVerb, on a few users' machines, that I could never reproduce. These were BAD issues: crashing the DAW, or making a huge outburst of noise. I refactored my code so that the audio code and the parameter calculation code for each algorithm were in different files, and only turned on the fp:fast flag for the audio code. This fixed the bugs.
During the AAX beta period for my plugins, I also had some users report crashes with ValhallaShimmer. I traced this down to some fp:fast related bug. Once I turned this option off for a specific file, the issue went away.
In general, I have found that fp:fast can cause bugs that are VERY tricky to track down, and may not show up on the vast majority of machines. I'm not saying don't use it - just use it with caution.
Sean Costello
- KVRAF
- 3008 posts since 17 Apr, 2010 from Croatia
I really don't quite know why, but I've just done some back and forth testing and my bizarre code actually runs faster with fp:precise, why ever?! And it doesn't seem to be too little, actually...difference between 32%(fast) and 30%(precise) cpu average.
Maybe one day I'll find out what I'm doing "wrong", but until then I keep precise on just to stay on top of things.
(PS: Love your reverbs, Valhalla!)
Maybe one day I'll find out what I'm doing "wrong", but until then I keep precise on just to stay on top of things.
(PS: Love your reverbs, Valhalla!)
-
- KVRian
- 573 posts since 1 Jan, 2013 from Denmark
What architecture are you targeting? sseX or x87 fpu? Try setting instruction to anything else than IA-32/x87. Only relevant if you're making 32-bit builds, of course.Taron wrote:I really don't quite know why, but I've just done some back and forth testing and my bizarre code actually runs faster with fp:precise, why ever?! And it doesn't seem to be too little, actually...difference between 32%(fast) and 30%(precise) cpu average.
Maybe one day I'll find out what I'm doing "wrong", but until then I keep precise on just to stay on top of things.
(PS: Love your reverbs, Valhalla!)
- KVRAF
- 3008 posts since 17 Apr, 2010 from Croatia
I'm currently only doing x64, actually, but frankly, I have no real idea about what you're saying there, as shameful as that is.
My system runs on Xeons (x5650), not sure what fpu subset it has?
I'm in dire need of more education there, I'm afraid.
My system runs on Xeons (x5650), not sure what fpu subset it has?
I'm in dire need of more education there, I'm afraid.
- KVRAF
- 3008 posts since 17 Apr, 2010 from Croatia
Hm...I'm discovering more things in VS options right now, haha...curious. As a test I got rid of all my doubles and went completely floats to see what effect this might have on the performance and it still appears marginal. Above all that tells me that I must have some heavier things going on independent from such fine optimizations. I keep wondering about loop content in regards to processor cache, you know.
Anyhow, I found Streaming SIMD Extensions (/arch:SSE) and SSE2 options along side Advanced Vector Extensions (/arch:AVX)(which apparently won't work for my vst compile, as it can't start up, if that's set). Again, I couldn't find any positive changes by setting SSE or SSE2, but rather a slowdown.
Surely my testing conditions aren't precise (running MuLab with a test pattern and watching the CPU Average), but my original findings remain valid. (fp:precise results in faster execution than fp:fast) with the addition that SSEx also only creates slowdowns.
Best results for me:
Optimization: Full Optimization (/Ox), Enable Intrinsic Functions (/Oi) and Favor fast code (/Ot)
Code Generation: No Enhanced Instructions (/arch:IA32), fp:precise)
(Are we totally hijacking this thread by now?)
Anyhow, I found Streaming SIMD Extensions (/arch:SSE) and SSE2 options along side Advanced Vector Extensions (/arch:AVX)(which apparently won't work for my vst compile, as it can't start up, if that's set). Again, I couldn't find any positive changes by setting SSE or SSE2, but rather a slowdown.
Surely my testing conditions aren't precise (running MuLab with a test pattern and watching the CPU Average), but my original findings remain valid. (fp:precise results in faster execution than fp:fast) with the addition that SSEx also only creates slowdowns.
Best results for me:
Optimization: Full Optimization (/Ox), Enable Intrinsic Functions (/Oi) and Favor fast code (/Ot)
Code Generation: No Enhanced Instructions (/arch:IA32), fp:precise)
(Are we totally hijacking this thread by now?)