Optimize plugin code for balanced load or least load?

DSP, Plug-in and Host development discussion.
mystran
KVRAF
5493 posts since 12 Feb, 2006 from Helsinki, Finland

Post Mon Nov 18, 2019 10:49 pm

karrikuh wrote:
Mon Nov 18, 2019 10:31 pm
2DaT wrote:
Mon Nov 18, 2019 1:46 pm
8. Avoid std library when possible. Prefer arrays to std::vector.
Why? II'm using std::vector all over the place, and to me it leads to much clearer and safer code than taking care of deleting memory myself. Also, I didn't observe any performance hit compared to manually allocated array.
It depends on what you do with your vectors, but if you're just using them to allocate dynamic memory (ie. you can't use static arrays anyway) for access with operator[] there shouldn't really be any impact whatsoever, unless you have some debug features enabled.
If you'd like Signaldust to return, please ask Katinka Tuisku to resign.

User avatar
aciddose
KVRAF
12315 posts since 7 Dec, 2004

Re: Optimize plugin code for balanced load or least load?

Post Mon Nov 18, 2019 11:43 pm

What's wrong with std::array ? You can get debug-time bounds-checking and all the useful size/length features in a static array with zero runtime overhead.

Using a vector (heap, dynamic) instead of a static array is guaranteed to be way less efficient in a variety of cases. You ought to be using aligned allocation of "whole" objects anyway, including any static arrays. Those are required for SIMD.

Some of the most important optimization problems aren't about hand-sharpening a blade, down to the bare metal level. They're knowing what types of algorithms and data structures to use and when. Using a dynamic structure like a vector with overhead where a static array would work is nuts.

Many of the problems you see in modern software are due to obsessive application of data structures or patterns where they aren't beneficial. The small overhead adds up when you use 1000s of vectors where static arrays would have worked. It might be cases like storing a 64-bit word where you only needed an 8-bit word and suddenly you're cache-smashed by huge multi-mb arrays that might have been 64k instead.

(cache-smash!)
Image
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.

User avatar
BertKoor
KVRAF
11340 posts since 8 Mar, 2005 from Utrecht, Holland

Re: Optimize plugin code for balanced load or least load?

Post Mon Nov 18, 2019 11:52 pm

Fender19 wrote:
Mon Nov 18, 2019 4:34 pm
I am testing my plugin by stacking 10 instances of it in one track in Reaper. When all 10 plugins are running Reaper reports 1.9% total CPU usage - and each plugin instance shows 0.2%. But when I remove all but one plugin it reports 0.3% usage (looks like 50% more for just one plugin by itself). That doesn't make sense to me - does it to you? Or are these numbers too "low in the weeds" to be meaningful?
Why don't you do proper unit tests for these things?
Rin your process(blockOfSamples) 10000 times while logging a performance timer.
You thenn also have control over what it actually will process and do comparisons.

0.2 or 0.3% is not a significant difference as you have seen.
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is back online!!

User avatar
aciddose
KVRAF
12315 posts since 7 Dec, 2004

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 12:05 am

Those small measurements in real-time tests are usually below the accuracy threshold anyway. There is likely more variation due to cache variations over time than you're actually measuring... so the 3% might actually be due to several small peaks on the first sample of each block or in certain other conditions, while processing the whole block, on a per-sample basis the cost is much lower.

The question then isn't actually "Who trashed my cache!?", but "What can I do to ensure this data remains as long as possible in a minimal number of consecutive cache lines?"

Code: Select all

struct voice_t { bool active; datadatadatadata... }
array<voice_t, N> v;
vs.

Code: Select all

struct voice_t { datadatadatadata... }
array<voice_t, N> v;
array<bool, N> v_active;
Image

Performance profiling is a whole universe in itself apart from mere optimization. If you're already up to your neck with optimization, don't jump face-first down that rabbit hole, Alice!
Last edited by aciddose on Tue Nov 19, 2019 12:17 am, edited 1 time in total.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.

User avatar
DJ Warmonger
KVRAF
3470 posts since 7 Jun, 2012 from Warsaw

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 12:15 am

II'm using std::vector all over the place, and to me it leads to much clearer and safer code than taking care of deleting memory myself
Many of the problems you see in modern software are due to obsessive application of data structures or patterns where they aren't beneficial.
Vector is beneficial when you actually need to delete memory and allocate structures dynamically - in things like preset list, or pieces of GUI. But not for audio buffers which need to be optimized as much as possible, and probably fixed size anyway. This is completely different area of application and design philosophy.
http://djwarmonger.wordpress.com/
Tricky-Loops wrote: (...)someone like Armin van Buuren who claims to make a track in half an hour and all his songs sound somewhat boring(...)

otristan
KVRAF
1994 posts since 28 Mar, 2005

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 12:42 am

Code: Select all

#include <boost/align/aligned_allocator.hpp>
#include <vector>
  
template <typename Value, std::size_t Alignment = 16>
  using AlignedVector = std::vector<Value, boost::alignment::aligned_allocator<Value, Alignment>>;
Vector is perfect for audio buffer, just need to ensure alignement for simd operation, and use it as a float* but at least it will take care of resize and automatic delete
float *pBuffer = &myVector[0];

Best of both world.
Olivier Tristan
Developer - UVI Team
http://www.uvi.net

User avatar
aciddose
KVRAF
12315 posts since 7 Dec, 2004

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 12:44 am

It's horrible for a short static buffer, of which you need 1000s. For example filter state buffers and bi-delay buffers used in a reverb mesh (or any other type of matrix). You might allocate the whole buffer once in a single block vs. thousands of tiny <1k allocations that trash not just the cache, but make a mess of the heap too.

Why would anyone ever need to re-size a static buffer? It's only allocated once (on init) and never changes, ever.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.

Z1202
KVRian
1093 posts since 12 Apr, 2002

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 2:17 am

syntonica wrote:
Mon Nov 18, 2019 2:52 pm
I've never had that much luck on the Mac with the fast-math. Never does a thing for me. It does seem to do some good with gcc. I don't recall if I even used it on MSVC, but I was busy learning the Windows Way of things.
IIRC on clang the fast math option can be overly aggressive, which sometimes can cause trouble. However there are more granular options of controlling various aspects of fast math. Don't remember details though, it was just some quick experiment.

User avatar
syntonica
KVRian
545 posts since 25 Sep, 2014 from Specific Northwest

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 10:49 am

Z1202 wrote:
Tue Nov 19, 2019 2:17 am
syntonica wrote:
Mon Nov 18, 2019 2:52 pm
I've never had that much luck on the Mac with the fast-math. Never does a thing for me. It does seem to do some good with gcc. I don't recall if I even used it on MSVC, but I was busy learning the Windows Way of things.
IIRC on clang the fast math option can be overly aggressive, which sometimes can cause trouble. However there are more granular options of controlling various aspects of fast math. Don't remember details though, it was just some quick experiment.
I'm still learning about controlling compilers with a whip and a chair rather than using IDE built-in settings. I never got any problems with the Relax IEEE Compliance setting, I just never got any speed boost.

Now that I look into it, -Ofast on clang adds -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs, so that may be why the extra flag in Xcode never does anything. :lol: However, I'm sure I've tried it just by itself using -O0 and not seen any significant gains. My tests use 5-10 instances of my plugin using different styles of patches so I get a good, overall average of CPU use.

User avatar
syntonica
KVRian
545 posts since 25 Sep, 2014 from Specific Northwest

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 11:00 am

aciddose wrote:
Tue Nov 19, 2019 12:44 am
It's horrible for a short static buffer, of which you need 1000s. For example filter state buffers and bi-delay buffers used in a reverb mesh (or any other type of matrix). You might allocate the whole buffer once in a single block vs. thousands of tiny <1k allocations that trash not just the cache, but make a mess of the heap too.

Why would anyone ever need to re-size a static buffer? It's only allocated once (on init) and never changes, ever.
To add, dynamic buffers are fine if you only occasionally need to change the size in large chunks. However, when you are sweeping that size by samples in live audio, that's a ton of overhead you don't need. I'll never understand that slavish insistence that "if it's there, you must use it" mentality, rather than looking at what's best for your use case.

User avatar
karrikuh
KVRist
309 posts since 6 Apr, 2008

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 11:28 am

aciddose wrote:
Mon Nov 18, 2019 11:43 pm
What's wrong with std::array ? You can get debug-time bounds-checking and all the useful size/length features in a static array with zero runtime overhead. <snip>
There's absolutely nothing wrong with std::array, of course. But they have different semantics than std::vector (dynamic size only known at run-time vs compile-time sized). The decision which to use should be primarily based on whether size is known statically or not.

Practical examples:
For a buffer of a delay line I would use a std::vector because it's size is chosen as a multiple of the
current host samplerate (run-time).
For intermediate sample blocks inside a modular synth I use std::arrays because the blocksize is kept fixed (e.g. 16 samples) and can e.g. be optimized to fit in a cacheline.

It's possible I misinterpreted 2Dats comment, I thought he was recommending to generally prefer plain dynamic arrays (using new [] / delete []) over std::vector to avoid any overhead he assumed that std::vector would add. The term "array" is obviously a bit fuzzy...

Fender19
KVRist
351 posts since 30 Aug, 2012

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 11:51 am

BertKoor wrote:
Mon Nov 18, 2019 11:52 pm
Fender19 wrote:
Mon Nov 18, 2019 4:34 pm
I am testing my plugin by stacking 10 instances of it in one track in Reaper. When all 10 plugins are running Reaper reports 1.9% total CPU usage - and each plugin instance shows 0.2%. But when I remove all but one plugin it reports 0.3% usage (looks like 50% more for just one plugin by itself). That doesn't make sense to me - does it to you? Or are these numbers too "low in the weeds" to be meaningful?
Why don't you do proper unit tests for these things?
Rin your process(blockOfSamples) 10000 times while logging a performance timer.
You thenn also have control over what it actually will process and do comparisons.

0.2 or 0.3% is not a significant difference as you have seen.
Yes, for accurate testing of blocks of code I need to do what you suggest - and I will do that to develop my own reference list of "dos and don'ts".

However, by running the "optimized" plugin in a DAW I am seeing how it actually behaves in real world use with dynamic signals (looped for repeatability and comparisons). That is what a customer sees and if what I'm doing makes no difference there then there really isn't much point spending a lot of time on it.

Fender19
KVRist
351 posts since 30 Aug, 2012

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 12:10 pm

mystran wrote:
Mon Nov 18, 2019 10:49 pm
It depends on what you do with your vectors, but if you're just using them to allocate dynamic memory (ie. you can't use static arrays anyway) for access with operator[] there shouldn't really be any impact whatsoever, unless you have some debug features enabled.
I know it is not memory efficient but I declare and set up my delay line (for latency compensation) in the constructor as a static array of maximum required length. I then account for different sample rates by simply setting the delay pointer length "modulus" point accordingly, i.e. DelayLength at 44.1K is 100 at 88.2K it's 200, etc.

I access the elements of that array using:

Code: Select all

	  
	  leftdelayed = DelayArray[0][delayPtr];	    //rotating buffer - retrieve saved value
	  rightdelayed = DelayArray[1][delayPtr];
	  DelayArray[0][delayPtr] = leftin;		   //rotating buffer - store new value
	  DelayArray[1][delayPtr] = rightin;
	  
	  delayPtr += 1;
	  delayPtr %= DelayLength;
Is this a good way to do this - or is it SLOW?

quikquak
KVRian
605 posts since 6 Aug, 2005 from England

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 12:44 pm

Just my 2ps worth... I've always avoided % as it internally uses a divide (yeah, it may not be relevant these days). I use something simple like this which also catches additions of larger than 1:-

Code: Select all

delayPtr += 1;
if (delayPtr >= DelayLength)
     delayPtr -= DelayLength;

otristan
KVRAF
1994 posts since 28 Mar, 2005

Re: Optimize plugin code for balanced load or least load?

Post Tue Nov 19, 2019 1:10 pm

aciddose wrote:
Tue Nov 19, 2019 12:44 am
It's horrible for a short static buffer, of which you need 1000s. For example filter state buffers and bi-delay buffers used in a reverb mesh (or any other type of matrix). You might allocate the whole buffer once in a single block vs. thousands of tiny <1k allocations that trash not just the cache, but make a mess of the heap too.
True. In specific cases hence I would not use this technique by default.
For the same reason you don't code your whole plugin in assembly but only when necessary/make sense.
Olivier Tristan
Developer - UVI Team
http://www.uvi.net

Return to “DSP and Plug-in Development”