Using Multiple Threads In Audio Thread

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

Forgive me if this is a dumb idea. I’ve been working on an oscillator that can already generate a lot of unison voices. I was researching how different synths did their optimization and realized that I could (maybe) use multiple threads on the main audio thread.

This might be a huge red flag, I’m not sure, so please tell me if this is a bad idea. What if each voice had its own thread? Yeah, there would be a lot of architecture and memory stuff to work on, but would it be worth it? If not, is there another place in the synth architecture that could benefit from multi-threading?

I had come across DUNE 3 and saw it used multi-threading, but I may have interpreted it incorrectly…

Post

Hmm... maybe not a good idea ... BTW, I got improvements by going with parallel processing (VS2019).

Post

Having worker threads process voices is a good idea if the CPU load is too high for a single core to process full polyphony. So it depends on the number of voices, and how much CPU a voice uses up.

Post

i have seen it done in some old code used in production. anyway, it was done only when the audio buffer was at least as big as 2048 or 4096 samples.
i think that if you have a thread pool already running, and you use some lock-free system to exchange information between the the threads it might work. but, being a bit paranoid about this kind of stuff, i would argue that you will have no guarantee about when the worker threads will be done doing their jobs, unless you are on an operating system in which you have control over the scheduler, so you will have to eventually wait for them on the real-time audio thread - which is prone to cause eventual dropouts - or have a fallback strategy, as skip the voices processed by the working threads that are not done under a certain time, which would still be glitchy and could be tricky to implement well.
but hey, the eventual dropout may be much better having a lot dropouts because the cpu load is too high for a single core.
personally i would use some system that enables the multi-threading only when it is really needed - some sort simple automate profiling thinghie? -, and give the user the option to disable it for live performances.
https://unevens.net - open source audio plug-ins

Post

Depends on your synth architecture. If you can completely separate the voices processing with minimal dependency you’ll get away with quicker processing of your buffer at the expense of say 10% to 15% OS CPU usage increase. This increase is due to threads waiting for each other and/or duplicate operations that your running per thread.

On some architectures it just doesn’t work. I tried fiercely to multi thread a poly modular plug to no avail. It works, but the overall increase in CPU vs the buffer speed up is not worth it. Specially when you have multiple instances or many other plugs. I haven’t completely give up but I doubt it makes sense to try more.
Last edited by S0lo on Fri Oct 15, 2021 6:56 pm, edited 1 time in total.
www.solostuff.net
Advice is heavy. So don’t send it like a mountain.

Post

It can interfere with the DAWs multithreading. Example, you are running your synth on 10 tracks, each track is probably already being rendered in a different thread. Now each synth creates 1 thread per voice, way too many threads. Way more threads than cores and everything is fighting to get CPU time.

However if it's a big heavyweight synth that people are most likely to play solo, then that might not be an issue.

If you do implement it, make it an option.

Post

It may also mess a lot with how the DAW tries to balance the load. I'm pretty sure e.g. Logic does some heavy lifting in trying to balance plugins in the graph across its worker threads so the load is evenly distributed. When you add your own worker threads, the DAW can't "see" that load and thus can't manage it.

Post

hugoderwolf wrote: Sat Oct 16, 2021 10:01 am It may also mess a lot with how the DAW tries to balance the load. I'm pretty sure e.g. Logic does some heavy lifting in trying to balance plugins in the graph across its worker threads so the load is evenly distributed. When you add your own worker threads, the DAW can't "see" that load and thus can't manage it.
Well, thankfully Apple has introduced Audio Workgroups which allows plug-ins to announce their threads to the host and stuff. Pretty cool concept, it should do the trick and remove any clash with the host's own threading. From what I took away from the docs, it has been mandatory to implement Audio Workgroups since Catalina for multi threaded AU plug-ins. According to our implementation, Logic does not support Audio Workgroups.

There's that.

Post

(ah, they say "should"... well, apparently they killed it then instead of adding it to Logic)

https://developer.apple.com/documentati ... guage=objc

Post

Ah, that's nice to know. Curious if/when it will actually work in a relevant way in Logic et.al.. At least someone's working on solving that particular problem.

Post

Launching new threads in audio code is not a good idea, but you can use a threadpool just fine (eg. I do all the time). Basically when the plugin DLL is loaded, I fire up one thread per CPU and bump them up to real-time priority. Then they wait on a queue. Then when I have something that can be done by a worker thread, I put that work (eg. one job for each voice) into the queue. Then wait for all the jobs to finish. Multiple instances of the same plugin can share the same threadpool.

It's a good idea to check the blocksize and process very short blocks directly (eg. some hosts sometimes pass the occasional single-sample block even if the average is much longer) to avoid threading when the sync overhead would be higher than the processing cost, but other than that it "just works" except for the slightly unfortunate situation of not being able to share the same worker threads between multiple plugins and the host.

Post

What if there are N plugins on a busy mix, each of them creating M threads and bumping to RT priority?

I'm not very convinced this is optimal from the project's point of view. Normally the best thread priorities can be assigned at a higher level of abstraction than a single plugin. A plugin opening threads is betting on the assumption that other cores are free to do work. If they are CPU pinned it might even prevent the OS scheduler to rebalance.

Maybe a plugin standard should abstract a DAW managed work queue for audio processing purposes? Sounds useful but also with potencial to be a vipers nest. EDIT: I see that Apple already did it.

Post

rafa1981 wrote: Sat Oct 16, 2021 7:19 pm What if there are N plugins on a busy mix, each of them creating M threads and bumping to RT priority?
In an ideal case. As long as these N*M threads:

1. Don't spinlock wait for each other.
2. Don't OS wait (yield) for each other.
3. Do OS wait (yield) once they are finished from their block/buffer

You should get good performance because the OS is supposed to do its scheduling work. The problem is, the above is only possible when all your threads work are totally independent. Which is true only for inter-plugin threads. Not for threads within the same plugin (unless you guarantee that)

In a sense. If you violate (1) for too long your going to get high OS CPU usage. If you violate (2) your going to get high DAW CPU usage. If you violate (3), well... very bad :hihi:
www.solostuff.net
Advice is heavy. So don’t send it like a mountain.

Post

Assuming threads that spin short and promtly and back off to idle.

Notice that I was speaking about a "busy mix"; a case where the DAW has more than enough work for each CPU, so it might be already processing a track on each (virtual) CPU core and now is additionaly waking up threads with RT priority on every track, some contending for the same resources.

Probably the DAW is running the tracks and plugins in a fixed order and the scheduler can after some struggle find an optimal ordering, but this doesn't account with e.g. VSTIs/parts not playing all the time.

The point is that in such probably hypothetical scenario N-CPU threads x N plugins may not be the configuration that maximizes utilization or the available resources. In this case the overhead of context switching in the best case or moving threads around to different CPUs in the worst, seems avoidable.

Hence why I also think that having multithreading as a configuration parameter is a good idea. It might allow to run a heavy plugin that wouldn't be able to reach the block deadline in a single core when other plugins in the project are single-thread, but having multithreading always enabled on CPUs that clear the deadline on a single instance might limit the potential of such CPU when the all-core usage starts to be high.

TL;DR it is project dependant if it optimizes or pessimizes resource usage. IMO it shouldn't be a default. The optimum amount of N-CPU threads is per-executable. If e.g. 20 plugin instances open N_CPUs threads each with RT priority things have potential to go south.

Post

rafa1981 wrote: Sat Oct 16, 2021 7:19 pm What if there are N plugins on a busy mix, each of them creating M threads and bumping to RT priority?
It's perfectly fine to have a hundred threads on RT priority if they mostly sit on a semaphore waiting. It's perfectly fine to have a queue where you put some work, then you post on a semaphore, a worker thread wakes up, does the job and goes back to sleep.
I'm not very convinced this is optimal from the project's point of view. Normally the best thread priorities can be assigned at a higher level of abstraction than a single plugin. A plugin opening threads is betting on the assumption that other cores are free to do work. If they are CPU pinned it might even prevent the OS scheduler to rebalance.
From the projects point of view, it typically makes sense to give one plugin as many resources as it can use, then use the rest for the next one and so on. The trickiest part of parallel processing an audio graph is finding enough actual parallel work and it makes sense to try to optimize the (real-)time that it takes to release further serially dependent work.

Now, the "non-optimality" of having multiple plugins have their own threads has nothing to do with any scheduling (which the OS can handle just fine and which is negligible overhead in practice when you're doing short chunks of work that won't typically run out of their time-slices at all). The real issue here is that one plugin using CPU cores that another plugin could use to finish faster means you might not be able to release further serial work as fast. This hurts the DAW "graph scheduler" where as the actual OS "thread scheduler" won't care... but note that those CPU heavy plugins not multi-threading internally and therefore finishing slower hurts the graph scheduler more so .. without a protocol for plugins to borrow DAW's worker threads, multi-threading in plugins internally is still a win for the whole project.

Post Reply

Return to “DSP and Plugin Development”