Login / Register 0 items | $0.00 New @ KVR
joeboy
KVRer
 
2 posts since 8 Mar, 2018

Postby joeboy; Thu Mar 08, 2018 4:15 am Too many threads?

Hello, first post here. Apologies if this is the incorrect forum.

I'm writing a very basic command line synthesis program in C on an older netbook. It has a two core intel atom processor, a gig of RAM, and runs OpenBSD. I'm targeting 48 Khz 16-bit stereo output. All I'm doing so far is querying wavetables (stored as doubles), adding them together, and dithering down to 16 bit ints before writing to the sound card, which shouldn't be too resource intensive.

Earlier single-threaded iterations of my program would start choking with ~16 voices playing at once, so I thought I could speed things up by making each voice run in its own thread. I'd only see a 50% performance boost on my two core machine, but I figured this design would scale well to hardware with even more parallel processors. These voice threads need to sync with the main audio output thread of course, but I've read that using mutexen is a bad idea for realtime audio, so I'm having active voices spinlock between events and only putting unused ones to sleep.

That is, if I start the program and enter a three note chord, then three voice threads will wake up. If the next chord contains four notes, then those first three will spin between notes, and a fourth thread will be awakened from its lock. If the event after that only contains two notes, then the first two threads will be kept alive, but the other two will be put back to sleep. I thought this would be a decent compromise.

With this complex behavior implemented, I was surprised by how awful it ran. Playing a single note at a time gave me uninterrupted playback just like my earlier program did, but anything more than that would slow down considerably. Even two note polyphony, which should be leveraging both available cores, choked and sputtered. It looks like most of my computational power is spent running the threads themselves rather than generating audio.

tl;dr - I have a few questions:
1. Is it common to parallelize on a per-voice basis, or does this sound foolish?
2. There is also a main audio output thread and a thread for the commandline interface. Is it possible that priority is being assigned to these threads over the voices? How to avoid this?
3. Is my hardware just weaker than I thought? Instead of messing with threads do I have to lower the sample rate or switch from doubles to fixnums?

Any info would be appreciated. Thanks.
matt42
KVRian
 
1044 posts since 9 Jan, 2006

Postby matt42; Thu Mar 08, 2018 5:43 am Re: Too many threads?

Spawning a few threads shouldn't cause major performance issues like that.

Are you setting their priority level? If they are running at a normal priority then that could be an issue.

Assuming your synth is a plugin, unless it requires a tonne of resources then it'd be best to avoid multi threading the audio path and leave that kind of optimisation to the DAW.
mystran
KVRAF
 
4979 posts since 11 Feb, 2006, from Helsinki, Finland

Postby mystran; Thu Mar 08, 2018 6:13 am Re: Too many threads?

joeboy wrote:Earlier single-threaded iterations of my program would start choking with ~16 voices playing at once, so I thought I could speed things up by making each voice run in its own thread. I'd only see a 50% performance boost on my two core machine, but I figured this design would scale well to hardware with even more parallel processors. These voice threads need to sync with the main audio output thread of course, but I've read that using mutexen is a bad idea for realtime audio, so I'm having active voices spinlock between events and only putting unused ones to sleep.


By using spinlocks instead of mutexes you're basically forcing your threads to consume all the CPU they can get their hands on. This is WAY worse than anything you can do with mutexes, because you're still waiting just as much, except now you're wasting tons of CPU on it. In other words, you are artificially slowing down the system for no good reason.

1. Is it common to parallelize on a per-voice basis, or does this sound foolish?


It's common, although usually it makes sense to use a thread-pool with one thread per CPU and once task submitted to the pool per voice, because having excess threads can potentially hurt your cache performance.

2. There is also a main audio output thread and a thread for the commandline interface. Is it possible that priority is being assigned to these threads over the voices? How to avoid this?


If you're not setting any priorities yourself, then all your threads likely run at the same (interactive) priority and will be scheduled on "best effort" basis together with anything else that's running on your system. If you want real-time threads, you need to assign them real-time priorities yourself.

3. Is my hardware just weaker than I thought? Instead of messing with threads do I have to lower the sample rate or switch from doubles to fixnums?


If you're choking a single core with 16 basic wavetable oscillators, it's quite likely that you're code is spending a lot of time on something pointless. You should really be able to do more than this on a 486.
Image <- plugins | forum
stratum
KVRAF
 
1845 posts since 29 May, 2012

Postby stratum; Thu Mar 08, 2018 8:14 am Re: Too many threads?

For that machine I would use floats instead of doubles and keep the code single threaded.
~stratum~
User avatar
Guillaume Piolat
KVRist
 
174 posts since 21 Sep, 2015, from Grenoble

Postby Guillaume Piolat; Thu Mar 08, 2018 2:59 pm Re: Too many threads?

I've read that using mutexen is a bad idea for realtime audio, so I'm having active voices spinlock


What SignalDust said.

Think about what happens with spinlocks vs mutexes locking. Mutex has essentially two very different costs depending on whether the lock is already taken:

- Uncontended case + mutex: no syscall, fast path is basically the cost of the memory barrier.

- Uncontended case + spinlock: no syscall, fast path is basically the cost of the memory barrier. Essentially same cost than a mutex, sometimes a tiny bit cheaper but if your lock isn't contented in the first place it will be very hard to measure, since not being a bottleneck.

- Contended case + mutex: thread is waiting in the mutex list without taking CPU. Hyper-threading may also give hand immediately to another thread, perhaps lucky enough to run. The OS mutex may have a bit of spinning before waiting in the scheduler.

- Contended case + spinlock: Anything goes. CPU gets consumed just for spinning. Wrong thread might get priority. If you are lucky you did put a HLT instruction to mitigate the complete disaster that spinlocks are, perhaps it even does something. Your consolation: no syscall or thread pausing... because they are spinning.


tl;dr:
spinlock are arguably worse in the contended case and essentially the same performance in the uncontended one against your typical user-mode OS mutex.
VST/AU/AAX: Couture | Panagement | Graillon
mystran
KVRAF
 
4979 posts since 11 Feb, 2006, from Helsinki, Finland

Postby mystran; Thu Mar 08, 2018 3:13 pm Re: Too many threads?

Guillaume Piolat wrote:- Contended case + spinlock: Anything goes. CPU gets consumed just for spinning. Wrong thread might get priority. If you are lucky you did put a HLT instruction to mitigate the complete disaster that spinlocks are, perhaps it even does something. Your consolation: no syscall or thread pausing... because they are spinning.


You can't use HLT unless you're in kernel code (it's priviledged) and that would completely suspend the CPU core (not just the thread) until the next interrupt fires.

Point stands though: never use spinlocks for anything, ever (unless you're writing operating system kernel code, in which case spinlocks are useful for synchronizing multiple cores.. but you STILL don't use them to suspend threads).
Image <- plugins | forum
joeboy
KVRer
 
2 posts since 8 Mar, 2018

Postby joeboy; Fri Mar 09, 2018 12:29 am Re: Too many threads?

Thanks for all the quick replies. This is my first attempt at using threads in a non-trivial manner, so your advice goes a long way.

matt42 wrote:Are you setting their priority level? If they are running at a normal priority then that could be an issue.

I hadn't. I'll look into that.

matt42 wrote:Assuming your synth is a plugin, unless it requires a tonne of resources then it'd be best to avoid multi threading the audio path and leave that kind of optimisation to the DAW.

It's nothing advanced like that. It's just a little standalone interactive shell. The user can type basic commands directly, or write more complex procedures in a higher-level language and pipe them into the program.

mystran wrote:By using spinlocks instead of mutexes you're basically forcing your threads to consume all the CPU they can get their hands on. This is WAY worse than anything you can do with mutexes, because you're still waiting just as much, except now you're wasting tons of CPU on it. In other words, you are artificially slowing down the system for no good reason.

I wish I was sharp enough to realize this before I wrote a day's worth of code. I read an authoritative-looking article that made audio mutexes out to be a cardinal sin due to their non-deterministic nature. Maybe I misinterpreted it by assuming spinlocks were an acceptable alternative.

mystran wrote:usually it makes sense to use a thread-pool with one thread per CPU and once task submitted to the pool per voice, because having excess threads can potentially hurt your cache performance.

This is my plan moving forward: have just two threads, with one covering odd voices and the other covering even ones.

stratum wrote:For that machine I would use floats instead of doubles and keep the code single threaded.

I've gone back and forth on this myself. Would floats be faster even on a 64 bit cpu? I've also run into phase problems using floats before. Representing my wavetables in doubles saves me some runtime math.

--

I'll try to re-write the code this weekend. I'm left with two additional questions though:

1. Is there a POSIX standard way to fetch the number of CPU cores on a machine? Everything I've read about so far seems to be BSD/Linux specific. I'm using sysctl() for OpenBSD atm.

2. Should each voice have its own buffer which is calculated and then summed into the main audio buffer in a loop at the end, or would it be okay to pass each voice a pointer directly to the main audio buffer, and have the voice operations do buffer[n]+=voice_calculation()? To the best of my knowledge, incrementing a value at a pointer is not an atomic operation. Would it be worth the overhead to put locks on this operation as well?
stratum
KVRAF
 
1845 posts since 29 May, 2012

Postby stratum; Fri Mar 09, 2018 12:36 am Re: Too many threads?

joeboy wrote:
stratum wrote:For that machine I would use floats instead of doubles and keep the code single threaded.

I've gone back and forth on this myself. Would floats be faster even on a 64 bit cpu? I've also run into phase problems using floats before. Representing my wavetables in doubles saves me some runtime math.



I have said it only because probably an atom processor doesn't have much cache (I didn't have a look at the exact amount, but more floats will fit to the same cache without a doubt.) Whether this makes a difference in practice depends on your algorithm. If it is data intensive (like a lookup table) then it probably will. If you need to do some additional math because of using floats instead of doubles, then the tradeoffs involved should be measured.
~stratum~
PurpleSunray
KVRian
 
789 posts since 13 Mar, 2012

Postby PurpleSunray; Fri Mar 09, 2018 12:54 am Re: Too many threads?

joeboy wrote:I wish I was sharp enough to realize this before I wrote a day's worth of code. I read an authoritative-looking article that made audio mutexes out to be a cardinal sin due to their non-deterministic nature. Maybe I misinterpreted it by assuming spinlocks were an acceptable alternative.

He was probably talking about a different problem. Spinlocks can make sense if you do not expect to wait for any signifcant time.
Example: you implement a queue to deliver your AudioBuffers from one thread to another.
Appending the new AudioBuffer* to the linked list inside the queue needs a lock. This is fast and will not block for a significant amount of time. No thread will accqiure it for a long time / no other thread will wait for it. So spinlock could be the right solution there. It is unlikly that it is locked when other thread enters, and if it is, it will be unlocked very soon again.

You have a different problem you want to solve.
You actually want to wait for the audio mainthread. As soon as you want to wait on something, forget about spinlocks.

joeboy wrote:1. Is there a POSIX standard way to fetch the number of CPU cores on a machine? Everything I've read about so far seems to be BSD/Linux specific. I'm using sysctl() for OpenBSD atm.


std::thread::hardware_concurrency() if you use c++ 11.
Otherwsie it is system specific.
sysconf(_SC_NPROCESSORS_ONLN) on linux

joeboy wrote:2. Should each voice have its own buffer which is calculated and then summed into the main audio buffer in a loop at the end, or would it be okay to pass each voice a pointer directly to the main audio buffer, and have the voice operations do buffer[n]+=voice_calculation()?

That's you decission.
I have seen architectures where audio buffers are float arrays and the processing code is processing classes, up to multi-channel + thread-safe AudioBuffer objects that even run own threads can do everything one could imagine (AudioBuffer::Write, AudioBuffer::Read, AudioBuffer::Flush, AudioBuffer::Resample, AudioBuffer::Mix, AudioBuffer::Timestrecht, ......... )

joeboy wrote:To the best of my knowledge, incrementing a value at a pointer is not an atomic operation. Would it be worth the overhead to put locks on this operation as well?

Yes, if multiple threads write/read from same memory, you need a lock.
i++ is not thread safe (you want __sync_add_and_fetch for therad-safe i++).
Miles1981
KVRian
 
1355 posts since 26 Apr, 2004, from UK

Postby Miles1981; Fri Mar 09, 2018 8:15 am Re: Too many threads?

mystran wrote:
1. Is it common to parallelize on a per-voice basis, or does this sound foolish?


It's common, although usually it makes sense to use a thread-pool with one thread per CPU and once task submitted to the pool per voice, because having excess threads can potentially hurt your cache performance.

Best sum-up.
Xenakios
KVRian
 
1137 posts since 9 Sep, 2005, from Oulu, Finland

Postby Xenakios; Fri Mar 09, 2018 10:48 am Re: Too many threads?

joeboy wrote:Earlier single-threaded iterations of my program would start choking with ~16 voices playing at once

You are testing optimized release builds and not debug builds, right?

Moderator: KVR Moderators (Main)

Return to DSP and Plug-in Development