Multiple threads running in DSP code

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Hello, I was wondering what your thoughts/caveats/outright dangers there were in multi-threading DSP code?

I need some parallel processing tasks that are all unrelated in resources that I want to do as fast as possible in the processing code.
It seems fairly easy to do, using CreateThread and WaitForSingleObject, but I'm guessing it's got issues with some host software or different hardware configurations.
Any problems with it or don't I do it all?

Many thanks,
Dave H.

Post

For starters those functions are Windows specific, c++11 has a new thread library that is portable and well documented, I'd use that instead.

In order for your code to work, you'll have to start a parallel task (another thread), wait for its completion/termination (also called joining) and then use the results it produced. Both spawning a new thread and joining it are two very expensive operations in a real time context, it's true that creating a thread is lightweight compared to creating a process, but it still is something that requires allocating resources, registering the thread with the kernel and wait for the scheduler to actually launch it. You're looking at two main problems, the time it takes to make the system calls (both for spawning and joining) and the scheduling timings.
The workaround is that one doesn't create threads just when needed but before, using for example thread pools. This makes awakening a thread easier but it doesn't make the scheduler any faster so unless you have to process an incredible amount of data, you're much better off doing the processing in the original thread.
Threads can proove very useful when there's a lot of work to do but the results aren't due in the current call (think about the tail of a reverb plugin), so one notifies a thread to start working on some data and collect the results a few buffers (assuming you're working real time) later, when they are actually needed.
This kind of scenario requires a solid and reliable synchronization mechanism between data producer and data consumer which is not trivial or easy to implement, actually quite the opposite. That's pretty much why almost no one does it.

Post

You can take a look at
http://threadingbuildingblocks.org/

It seems to work better on newer CPU like i5 or i7 otherwise with small latency
you will get some glitches
Olivier Tristan
Developer - UVI Team
http://www.uvi.net

Post

Ciozzi wrote:For starters those functions are Windows specific, c++11 has a new thread library that is portable and well documented, I'd use that instead.

In order for your code to work, you'll have to start a parallel task (another thread), wait for its completion/termination (also called joining) and then use the results it produced. Both spawning a new thread and joining it are two very expensive operations in a real time context, it's true that creating a thread is lightweight compared to creating a process, but it still is something that requires allocating resources, registering the thread with the kernel and wait for the scheduler to actually launch it. You're looking at two main problems, the time it takes to make the system calls (both for spawning and joining) and the scheduling timings.
The workaround is that one doesn't create threads just when needed but before, using for example thread pools. This makes awakening a thread easier but it doesn't make the scheduler any faster so unless you have to process an incredible amount of data, you're much better off doing the processing in the original thread.
Threads can proove very useful when there's a lot of work to do but the results aren't due in the current call (think about the tail of a reverb plugin), so one notifies a thread to start working on some data and collect the results a few buffers (assuming you're working real time) later, when they are actually needed.
This kind of scenario requires a solid and reliable synchronization mechanism between data producer and data consumer which is not trivial or easy to implement, actually quite the opposite. That's pretty much why almost no one does it.
A great reply, cheers.

I've just found out that that the thread takes a long time to initialise, which corresponds with what you're saying, only now I know why, thanks.
So I should spin-up any threads on plug-in start-up if I want something done in parallel, and use all the usual lock-free FIFO fun that surrounds it. I may just walk away from it, and let the host worry about threads. hmmm.

Post

otristan wrote:You can take a look at
http://threadingbuildingblocks.org/

It seems to work better on newer CPU like i5 or i7 otherwise with small latency
you will get some glitches
I'll have a read, thanks for the link.

Does anybody here use multi-threading in their DSP?
Apparently it's the future, but maybe not for humble little plug-ins. :)

I'm looking for a way to speed up my convolution code, and parallel FHTs would do the trick - and hopefully not the break memory cache too much.

Dave.

Post

DaveHoskins wrote:
Ciozzi wrote:For starters those functions are Windows specific, c++11 has a new thread library that is portable and well documented, I'd use that instead.

In order for your code to work, you'll have to start a parallel task (another thread), wait for its completion/termination (also called joining) and then use the results it produced. Both spawning a new thread and joining it are two very expensive operations in a real time context, it's true that creating a thread is lightweight compared to creating a process, but it still is something that requires allocating resources, registering the thread with the kernel and wait for the scheduler to actually launch it. You're looking at two main problems, the time it takes to make the system calls (both for spawning and joining) and the scheduling timings.
The workaround is that one doesn't create threads just when needed but before, using for example thread pools. This makes awakening a thread easier but it doesn't make the scheduler any faster so unless you have to process an incredible amount of data, you're much better off doing the processing in the original thread.
Threads can proove very useful when there's a lot of work to do but the results aren't due in the current call (think about the tail of a reverb plugin), so one notifies a thread to start working on some data and collect the results a few buffers (assuming you're working real time) later, when they are actually needed.
This kind of scenario requires a solid and reliable synchronization mechanism between data producer and data consumer which is not trivial or easy to implement, actually quite the opposite. That's pretty much why almost no one does it.
A great reply, cheers.

I've just found out that that the thread takes a long time to initialise, which corresponds with what you're saying, only now I know why, thanks.
So I should spin-up any threads on plug-in start-up if I want something done in parallel, and use all the usual lock-free FIFO fun that surrounds it. I may just walk away from it, and let the host worry about threads. hmmm.
You can create your own class that creates a new thread on construction and keeps it asleep until you need it. There are mainly two ways to implement this, one is with conditional variables(locks required), the other one is with spin-waits and spin-locks(no locks but atomic bools are required). Without getting into much details (you'll find plenty on both methods online) spin-locks/waits are just "while" cycles around an atomic bool variable, something like:

Code: Select all

while (dataIsReady==false) Sleep(0);
One of the many problems is that if you implement such code on the producer side (the one that's waiting for the call):

Code: Select all

 
 while (doSomething==false) Sleep(0);
the producer will keep spinning giving you 100% load on the CPU core its running on, so you need it to go to sleep and notify it through system calls when there's something to do.
You may wanna try to search for a free portable threadpool implementation (like boost threadpool), it would make your life easier.

Post

I'll try the atomic variable flags, might give it a go tomorrow. Cheers.
The question now is, has anyone successfully used threads in a plug-in? Or is it obvious and I've been missing out all this time? :oops:

:Dave

Post

I do agree that its 'the future' because newer CPU's will have more core's but not be substantially faster I(as was the case in the earlier CPU's).

So being able to perform parallel tasks is the the only way to utilize the power of the machine the code is running on more effectively.

That being said I see more of a multi-threading opportunity for the host and not so much for the plugins - unless the plugin does say offline processing and taking the overhead of extra threads outweigh's the benefit. Forks in the audio path could be processed parallel (by the host).

If you can chop the processing of the plugin up in several independent pieces (two or more) you could experiment with it. Create threads at start up and dedicate them to a certain job. Keep them fully initialized in standby mode (suspended) so they can fly at a moment's notice.

The decision if and how many threads to create should be dynamic. You have no use for multiple threads on a single cpu (core) machine and you may instantiate more than two on a quad core etc. What the sweet spot is (number of threads vs number of cores) depends very much on the work being done.

Oh, sometimes the UI can benefit too from an extra background worker thread - keep your eye out for that one.

[2c]
Grtx, Marc Jacobi.
VST.NET | MIDI.NET

Post

DaveHoskins wrote:I'll try the atomic variable flags, might give it a go tomorrow. Cheers.
The question now is, has anyone successfully used threads in a plug-in? Or is it obvious and I've been missing out all this time? :oops:

:Dave
IMHO it seems like not worth bothering with for many use cases in plugins. Maybe a monster like Kontakt could benefit because sampler voices are such an obvious thing to effectively handle in parallel.

If many "small plugins" loaded by a host would deal with their own threads, I would think that might easily throw off the load balancing of the system as a whole. Remember that you don't live in the DAW host/operating system alone...While it would be incredibly nice to think you are the most important thing running! :)

Post

I believe Diva from U-he uses boost::thread with success: http://www.kvraudio.com/forum/viewtopic ... ight=boost

faust can convert your dsp into highly efficient parallelized c++ code and can compile vsts. I don't know for sure if faust will do both, does anyone know?

http://faust.grame.fr/

http://faust.grame.fr/index.php/documen ... a-2010-art

Post

markneo wrote:I believe Diva from U-he uses boost::thread with success: http://www.kvraudio.com/forum/viewtopic ... ight=boost
Yep. Also works okayish in current ACE betas.

We start a thread per synthesizer-voice and activate them on chunks of 64 samples. This works well on anything post-North Bridge. It does not work well on systems with external memory controllers.

The overhead seems bearable for very expensive processes, e.g. synthesizer voices that take up 20% or more CPU. It does not make much sense in anything that uses a total of less than, say, 20% CPU on a given system because then multithreading is better done by the host environment.

Post

Xenakios wrote:While it would be incredibly nice to think you are the most important thing running! :)
Diva thinks that way. Hence the name :-p

Post

An important thing when multithreading audio processing is that you should assign the same priority to your worker threads as the calling thread (which may be dynamic in the host, when going over 100% CPU a host may lower the priority not to freeze the GUI).
But it gets more complicated with Vista because there's also a way to mark a thread as designed for audio processing (to achieve the same as what I described above).

And yes, it's only for very heavy plugins, as it takes a lot of time to wake up a waiting thread.

A proper way for a synth, if voices are independent enough, is to have as many threads as cores minus one, you release the waiting threads, then start processing the voice pool immediately in the calling thread. Once the waiting threads will start processing, the calling thread will have processed some voices already, from this point the multithreading really starts, thus the multithreading will generally help & not be a problem when the plugin doesn't have much to process, as in the worst case the waiting threads will be awaken for nothing.

It does not make much sense in anything that uses a total of less than, say, 20% CPU on a given system because then multithreading is better done by the host environment.
For a synth playing alone, the benefits start at 5% (which is already pretty big) here. In a busy projects it really depends what's parallelizable by the host.
It also depends a lot on the processing buffer length (driver's buffer + host's slicing), as of course it would be pointless do multithread tiny slices of 20 samples.
DOLPH WILL PWNZ0R J00r LAWZ!!!!

Post

I've been working for the past months on a multi threaded convolution engine and my results are that if done well parallel code che substantially reduce the strain on the cpu. Waking a thread takes about (at least on my i7 quad core windows 7 laptop) anywhere from 2500 to 5000 clock cycles, which at 2GHz is around 2 microseconds. If one works with a buffer of 2ms that is roughly 0.1% of the whole time slice, so nothing to be really worried about, I've also read around that UNIX based systems have even better performances but never tested it myself.
The problem as it's been already pointed out several times is that scheduling times can vary a lot making in a real time enviroment unsafe to spawn and join the very same thread in a single function call.
What is safe and works very well without much overhead is using threads to deploy load in the background, when the data being processed is for example the tail of a reverb or a synth or a virtual instrument. In some cases only a small part of the this data is to be sent back to the host making the rest of the work doable in the background.

Post

Urs wrote:
markneo wrote:I believe Diva from U-he uses boost::thread with success: http://www.kvraudio.com/forum/viewtopic ... ight=boost
Yep. Also works okayish in current ACE betas.

We start a thread per synthesizer-voice and activate them on chunks of 64 samples. This works well on anything post-North Bridge.
Per voice ?
Like if I have a polyphony of 16, you start 16 thread ?

Thanks !
Olivier Tristan
Developer - UVI Team
http://www.uvi.net

Post Reply

Return to “DSP and Plugin Development”