What is KVR Audio? | Submit News | Advertise | Developer Account

Options (Affects News & Product results only):

OS:
Format:
Include:
Quick Search KVR

"Quick Search" KVR Audio's Product Database, News Items, Developer Listings, Forum Topics and videos here. For advanced Product Database searching please use the full product search. For the forum you can use the phpBB forum search.

To utilize the power of Google you can use the integrated Google Site Search.

Products 0

Developers 0

News 0

Forum 0

Videos 0

Search  

Secrets to writing fast DSP (VST) code?

DSP, Plug-in and Host development discussion.

Moderator: Moderators (Main)

KVRist
 
69 posts since 30 Aug, 2012

Postby Fender19; Wed Jun 18, 2014 12:14 pm Secrets to writing fast DSP (VST) code?

I have purchased VST plugins from some of you folks that have tremendous graphics, tons of controls and intensive audio processing - yet they have very low CPU usage. Incredible. My hat's off to you!

My plugins, on the other hand - even simple, non-GUI ones - are CPU hogs. Terrible.

I realize some of this knowledge is proprietary but are there any basic "rules of thumb" for writing fast DSP code (especially for VST plugins)? Things like choice of compiler brand and/or certain settings; using "while" loops instead of "for" loops, "don't do this", etc.? How do you get so much plugin with so little CPU?

Any advice appreciated!
KVRist
 
249 posts since 26 Apr, 2004, from UK
  

Postby Miles1981; Wed Jun 18, 2014 12:31 pm Re: Secrets to writing fast DSP (VST) code?

IMHO there are two things in fact:
- good algorithmic compromise
- efficient C++
It's not about while or for loops (they are the same in binary code), it's about what kind of functions you call (exponential, sin...), the algorithm you choose (TLU or optimization, FFT or convolution) and the way you access memory (can operations be vectorized?).
KVRian
 
672 posts since 1 Dec, 2004

Postby MadBrain; Wed Jun 18, 2014 12:39 pm Re: Secrets to writing fast DSP (VST) code?

- Find the "core" loop of your plugin (normally this is the "for" that calculates each sample) and make sure it does the least possible calculations (so that everything heavy is calculated beforehand).

- In particular, make sure the core loop avoids all slow math operations: sin(), cos(), tan(), asin(), acos(), atan(), pow(), sqrt(), log(), exp(), log10(), and also division. (these operations are acceptable in setup and "control rate" operations)

- Make sure the core loop isn't doing slow accesses to heavy data structures (for instance, don't use an std::deque to implement a delay line).

- Make you're you're making your release builds with optimisation on and "fast math" ("fast" floating point model in MSVC).

- Instead of calculating envelopes/lfo/pitch/filter coefs every sample, calculate them every few samples (I like calculating them every 16 samples or so) and ramp the volume (pitch and cutoff don't have to be ramped), this makes it a lot easier to have a fast core loop.
User avatar
KVRer
 
29 posts since 18 Apr, 2014, from London
  

Postby avasopht; Wed Jun 18, 2014 12:53 pm Re: Secrets to writing fast DSP (VST) code?

Profile your code to find bottlenecks.
User avatar
KVRist
 
176 posts since 30 Apr, 2006, from PA->FL->PA->???
 

Postby random_id; Wed Jun 18, 2014 2:20 pm Re: Secrets to writing fast DSP (VST) code?

MadBrain wrote:- Make you're you're making your release builds with optimisation on and "fast math" ("fast" floating point model in MSVC).


Are most people using the fast math mode instead of precise? I am wondering what kinds of side-effects happen (or don't) with samples, convolutions, IIRs, etc.
KVRAF
 
4419 posts since 16 Feb, 2005

Postby camsr; Wed Jun 18, 2014 2:31 pm Re: Secrets to writing fast DSP (VST) code?

Use multiplication over division whereever it's possible.

Here is a small and simple example

Code: Select all
void setfs(float thesamplerate)
{
    fs = 1.f / thesamplerate;
}
void setf(float fc)
{
    f = tanf(pi * (fc * fs));
    f2 = 1.f / (1.f + f);
}


Since setfs() is called very sparingly, and setf() is called very frequently, using the reciprocal as it was written in the orignal equation (as fc / fs) is way faster.
Image
KVRist
 
69 posts since 30 Aug, 2012

Postby Fender19; Wed Jun 18, 2014 5:41 pm Re: Secrets to writing fast DSP (VST) code?

Thanks, all. I am aware of most of those principles with the exception of the "multiply instead of divide" suggestion. Will try that. Thank you!

Now, one area I don't thoroughly understand is memory access to/from large arrays (for FIR, FFT, delay line, etc.). I've heard that "indexing math" is slow but what, exactly, does that mean? Does that mean that any variable inside the index makes it slow, like this:

Code: Select all
for (i = 0; i < N, i++) output = buffer[i];

or that doing math inside the index brackets is slow, like this:

Code: Select all
offset = 20;
for (i = 0; i < N, i++) output = buffer[i + offset];

Are these methods of accessing data in an array slow in general or is that how it's typically done? Is there a better/faster way?
KVRAF
 
4419 posts since 16 Feb, 2005

Postby camsr; Wed Jun 18, 2014 5:53 pm Re: Secrets to writing fast DSP (VST) code?

Doing index arithmetic is okay (from my experience) if the array contains floats. They are two separate parts of the CPU.
Image
KVRAF
 
4419 posts since 16 Feb, 2005

Postby camsr; Wed Jun 18, 2014 6:03 pm Re: Secrets to writing fast DSP (VST) code?

MadBrain wrote:- In particular, make sure the core loop avoids all slow math operations: sin(), cos(), tan(), asin(), acos(), atan(), pow(), sqrt(), log(), exp(), log10(), and also division. (these operations are acceptable in setup and "control rate" operations)


Yes, and the processing loop should avoid calling any functions at all unless it's impossible to avoid.
Image
KVRian
 
845 posts since 17 Apr, 2005
 

Postby JCJR; Wed Jun 18, 2014 7:16 pm Re: Secrets to writing fast DSP (VST) code?

Dunno if it is worth the trouble, and maybe too old fashioned, but something I've done a lot when possible on tight loops-- 32 bit code. I suppose the same would work for 64 bit, haven't studied 64 bit asm.

The cpu can "easy and convenient" hold up to 6 int values or pointers, and 8 doubles in the fpu registers. When possible I try to make tight loop asm modules that use that many vars or less. Load all the fpu constants, in out pointers, loop comparison vars, etc all into the cpu and fpu. Then do a tight loop over the buffer ideally doing NO MEMORY ACCESS except reading input samples and writing output samples. Everything else necessary was loaded into the cpu and fpu before beginning the tight loop.

Maybe some compilers can do as good or better on stripped down tight loops, but I've never noticed a compiler usually filling the fpu more than a few numbers deep, and if a certain constant is used by the fpu in several lines, I've never noticed a compiler loading that constant one time and leaving it in the fpu for multiple operations. The ones I've looked at seem to want to load the fpu for every line, then flush the fpu, then load again for the next line. That would seem to involve a lot of redundant memory access, but maybe some compilers are smart enough to do such optimizations. Am very out of date on most everything.
KVRian
 
608 posts since 9 Jan, 2006

Postby matt42; Wed Jun 18, 2014 7:25 pm Re: Secrets to writing fast DSP (VST) code?

I noticed in another thread that you mention you are using an old borland compiler. You don't sound too keen on using MSVS but it's free, generates efficient code and the VST examples compile out of the box.

You may be able to get performance increases just by switching. I never used borland, but when I compared MSVS with GCC a few years ago MSVS generated significantly faster code. (Although I did see a thread here, also quite a while back, listing about a page worth of compiler options that apparently brought GCC up to speed).
KVRist
 
249 posts since 26 Apr, 2004, from UK
  

Postby Miles1981; Thu Jun 19, 2014 12:23 am Re: Secrets to writing fast DSP (VST) code?

Fender19 wrote:Now, one area I don't thoroughly understand is memory access to/from large arrays (for FIR, FFT, delay line, etc.). I've heard that "indexing math" is slow but what, exactly, does that mean? Does that mean that any variable inside the index makes it slow, like this:

No, index math is really fast. Wht is not fast is getting the array from memory to the cache levels. The compiler has to optimize those calls, and if you work with only one float array, it knows it can vetorize the computation without having to get the updated data. The issue is that you usually have several arrays (one input, one output) and the compiler HAS to assume they share some space content. So each time you compute one value, it has to get the input data back from the cache, which is dead slow (this is actually the reason why Fortran is so fast, it doesn't allow aliasing pointers).
There are compiler extensions that deal with this efficiently, but you need a updated compiler (that handles vectorization properly), and then learn how to write such code. You can also use tools like Boost.SIMD.
KVRist
 
493 posts since 11 Apr, 2002

Postby Z1202; Thu Jun 19, 2014 12:44 am Re: Secrets to writing fast DSP (VST) code?

Two key things IMHO:
- learn the assembler language(s) of your target machines (I don't mean "write in the assembler", just learn it). This will give you a clear understanding of the effects of the choice of the compiler, which functions to call, which high-level language constructions to avoid etc.
- make sure your math knowledge has a sufficient level, so that you can optimize the algorithms you use. E.g. you don't need to call an exp() or pow() functions to generate an exponential envelope.

This might be a time investment but it's gonna be worth it. Otherwise you will be always relying on a set of known tricks and other people's opinions.
User avatar
Urs
u-he
 
16953 posts since 7 Aug, 2002, from Berlin
 

Postby Urs; Thu Jun 19, 2014 1:09 am Re: Secrets to writing fast DSP (VST) code?

Block processing. Instead of rendering oscillators, LFOs, envelopes, filters, and what not in one loop, create a method to call for each. Make those methods take not just one sample to process, but a stream of 64 or more. When they process, each module renders into its own temporary buffer when necessary.

Prefer stack memory over heap, and oversample individually as needed:

void render( float* in, float* out, int numSamples )
{
float tmp[ numSamples * 4 ];

upsample4( tmp, in, numSamples ); // upsample into tmp buffer

… process tmp …

downsample4( out, tmp, numSamples ); // downsample into out buffer
}

This takes into account that oversampling alone doesn't help anti aliasing much (need I explain?). Anti aliasing is only met properly if each individual module is oversampled. Alternatively, keep the large buffers and lowpass inbetween each module (that's the same thing but might require more memory on stack). The advantage for CPU optimisation is that each module can be oversampled just as much as it needs. A bandlimited oscillator doesn't need to be oversampled at all, a non-linear filter does.
KVRist
 
497 posts since 23 Nov, 2010

Postby sonigen; Thu Jun 19, 2014 3:20 am Re: Secrets to writing fast DSP (VST) code?

Chris Jones
www.sonigen.com
Next

Moderator: Moderators (Main)

Return to DSP and Plug-in Development