Multiple threads running in DSP code

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

FWIW: I use a seperate thread in Combo Model V/F to recalculate waveforms, using CreateThread (thanks to SWELL both on Windows and OSX). However, in my case the extra thread doesn't have much priority, because it only recalculates waveforms in the background.

Post

That's a load of great replies, thank-you everyone.
It seems that some of you stay clear of threads while others embrace them. And like any other aspect of coding it all depends on the situation and how they're used. (Although Urs's Diva seems to run OK soaking up all the power there is:))

I could give the user a choice to switch to multi-threading, but I would rather it be invisible. I can't really do a small rig-test at the plug-in's start up, because the whole nest of threads is unknowable at a later point of the user's work.
I'll experiment - it's the only way, as always. :)
Thanks again,
Dave.

Post

Please keep in mind that this might conflict with the hosting program's thread management, so it would be a nice move to make it configurable.
"Until you spread your wings, you'll have no idea how far you can walk." Image

Post

Yeah, I guess all hosts behave in different ways with threading.
I'm not massively familiar with multi-threading a single task, so I'm going to look into it some more anyway.
I've heard that 'Boost' threading is not officially apart of Boost yet, and I've seen forums posts of memory leaks and other issue. Which is making me look elsewhere.
Other than the Intel@TBB resource, has anyone tried using this before:-
http://tinythreadpp.bitsnbites.eu/ ?

Cheers,
Dave.

Post

TinyThreads works perfectly (happy user here). It implements a small subset of the C++11 thread and mutex libraries. If you want to switch between C++ std::thread and tthread::thread it can usually be done just changing the includes (class names and functions are fairly compatible).
One thing that I've noticed on Windows 7 is that when I use std::thread or std::mutex cubase doesn't unload the dll when I close the plugin, so I have to shut down the host and restart it in order to recompile the plug (a real pita), with tinythread this doesn't happen.

Post

I have seen that TinyThreads contains lock algorithms only. May that be a problem for audio plug-ins with a given complexity, no matter the performance of the lock algorithms ?

Else, the next release of Boost (1.53) will have lock free implementations thanks to Tim Blechmann :

http://www.boost.org/users/history/version_1_53_0.html

Post

It looks like someone has added an extension to TinyThreads:-

https://github.com/jdduke/tthread
Adding:-
* Lambda, function and function object execution support
* tthread::future and tthread::async, lightweight analogs to std::future and std::async
* future.then() for task continuations
* basic fifo threadpooling with thread_pool

Haven't tried it though, just thought I'd post for interest. Of course it means it should be called 'MediumThreads.' Ha! :roll:

Post

Thumbs down, because it requires c++11 features. For that, why not just use the thread facilities in the c++11 standard library...

Post

thevinn wrote:
Thumbs down, because it requires c++11 features. For that, why not just use the thread facilities in the c++11 standard library...
I agree, the only thing is that C++11 doesn't have a thread pool implementation :(

Post

OK, thanks for looking at that.
This afternoon I was looking through the standard TinyThread++1.1 code and I've now created three convolution partitions running in parallel, at around a third of the CPU than before.
Nice AND surprising! :)

Post

OOKAY, I thought that was too good to be true. :hihi:
It turns out I wasn't waiting for the thread to finish correctly. So now that's 'fixed' I'm getting about 20% improvement, which is better than none at all, but not what I'd hoped.
I suspect I'm scrubbing the memory cache too much, plus the longest process is going to stop them all, so I guess some load balancing is now in order.
Dave.

Post

I suspect I'm scrubbing the memory cache too much
While it does matter (here I have a quad with 2 pairs sharing the same cache, it does make a difference when I play with core affinity [which I don't recommend]), I'm pretty sure most of the "CPU" goes into waiting for the threads to start.

With a quad core you could expect a 300% gain at best, maybe 200% average. The end user, brainwashed by Intel's ads, of course expects 500% (& we all know how Reaper processes 1500% faster).

Hints (for PC):
-use a double-event system for waiting threads, you can then release all waiting threads by setting one event
-you don't need all worker threads to set an event to tell they're done, only the last one. The other ones can do an interlocked decrement
-if you have non-threadable stuff to do in the main thread, you can do it while waiting for the last worker thread to be done working, chances are that you won't have to wait at all (the processing being your "spinning", thus it's "free" CPU)

..but as already written, it's for heavy stuff only.
With Harmor I can get up to 300% better on my quad core.. but only going from 45% CPU usage down to 15%.
DOLPH WILL PWNZ0R J00r LAWZ!!!!

Post

Thanks for the tips, currently I'm having the threads continuously running, then triggering them out of a 'while' loop checking for a go flag. The process is then ran in the thread, then it sets it's own flag when it's finished.
I check this finish flag in the DSP loop and wait until the data has been processed, the thread then loops infinitely waiting for go or kill flags.

It does appear to be re-entrant with the three convolve threads calling the 'processStereo' routine simultaneously.

Code: Select all

void FX_Convolve_Block::GoThread()
{
	lock_guard<mutex> guard(arg.m);
	arg.finished = false;
	arg.wait = false;
	arg.cond.notify_one();
}
void FX_Convolve_Block::Thread()
{
	lock_guard<mutex> guard(arg.m);
	do										   
	{
		while(arg.wait)
		{
			arg.cond.wait(arg.m);
		}

		if (!arg.kill)
		{
			processStereo();
			arg.wait = true;
			arg.finished = true;
			arg.cond.notify_one();
		}
	}while(!arg.kill);
}
The 'guard' thing is a mutex that gets unlocked out of scope. From TinyThread++.
It seems to work OK, but I am curious as to how this saves any CPU time now. If a host displays total CPU usage then why would it go down at all? All I'm doing is distributing the same cycles about the processors. But I guess it does even the load a bit.

Post

the thread then loops infinitely waiting for go or kill flags.
But do you mean it -really- loops infinitely? Doing so is very bad, you're pretty much using 100% of the CPU, it will kill a single core (depending on the priority of the calling thread), & will be very bad for a multicore.

The "proper way" is to make a thread wait (WaitForSingleObject in Windows), or to use a system that wraps this (knowing that critical sections too work with WaitForSingleObject, only they also do spinning, which is a while loop and is ok *when it's not too long*).

A bad, but still ok way is to Sleep() in-between. But you definitely shouldn't be in an infinite loop.
Your infinite loop will answer very fast, and yes making your threads wait for an event will make the wait a lot worse, but still, an infinite loop is very host (& system) unfriendly.

You're using a custom mutex but I don't know what it hides, so maybe it's hiding a proper synchronization method.
DOLPH WILL PWNZ0R J00r LAWZ!!!!

Post

Is does a 'WaitFor...' in TinyThreads code. It's in the 'arg.cond.wait(arg.m);' call.
And this is what they do in their, albeit limit examples.

Post Reply

Return to “DSP and Plugin Development”