KVR Audio

Nowhk · Post by **Nowhk** » Wed Jan 17, 2018 8:40 am

Hi all,

I've a Biquad filter (TDF2, 5 coefficients) that is modulated at control rate within my audio thread. So every N samples I refresh the coefficients.
In parallel, there's a GUI thread which take these 5 coefficients and evalutate the frequency responses for every X freqs to plot (by complex zeros/poles).

The problem is: sometimes it switch between threads, messing the coefficients with previous iterations (preemptive approch).
I don't want to place a mutex on GUI thread, blocking audio thread, which must always be as fast as possible in my opinion.

So my idea is to place a flag variable within the gui thread at the beginning, make a copy of each coefficient (on local var) and then use them instead of the "real" filter one (which change constantly and indipendent from the drawing).

The fact is: what if the thread switch occurs when I'm copying them? I've reduced the probably of messing with data, but there's always a possibility.
My question: is there a way to copy a bulk of data (in this case, 5 double values) in one single efficient instruction? So, once started, the thread can't switch between instructions (I don't think it will "break" in the middle of a single instruction).

Or is there a better way to do it?
Thanks dude, as usual!

DNAdisaster · Post by **DNAdisaster** » Wed Jan 17, 2018 10:08 am

You could use a lock-free queue to send the data to the GUI thread (e.g. boost::lockfree). One approach would be to have the audio thread push a struct containing the coefficients onto the lock-free queue whenever it is empty. Then the GUI thread pops the struct from the queue (whenever it is not empty) and updates itself.

Nowhk · Post by **Nowhk** » Wed Jan 17, 2018 10:43 am

DNAdisaster wrote:You could use a lock-free queue to send the data to the GUI thread (e.g. boost::lockfree). One approach would be to have the audio thread push a struct containing the coefficients onto the lock-free queue whenever it is empty. Then the GUI thread pops the struct from the queue (whenever it is not empty) and updates itself.

Thanks for the suggestion, but as I said I'd like to avoid lock/mutex (and lock-free in general, especially within an audio application).

stratum · Post by **stratum** » Wed Jan 17, 2018 11:24 am

Thanks for the suggestion, but as I said I'd like to avoid lock/mutex (and lock-free in general, especially within an audio application).

Instead of following speculative information from that post, just test it and if it works, then simply it does.

matt42 · Post by **matt42** » Wed Jan 17, 2018 12:11 pm

No mutex and no lock-free queue.... Well, sorry they'd be both of my suggestions.

A lock free queue should work fine. I've never used the boost one though.

As for mutex one option on the audio thread is std::mutex::try_lock this way if you get the lock you can update the coefficients, otherwise the code continues and you can try the lock again next call. The audio thread won't block in other words

Guillaume Piolat · Post by **Guillaume Piolat** » Wed Jan 17, 2018 12:36 pm

+1 it's important not to lock from the audio side, but the GUI thread may well take a lock and live happily ever after.

Nowhk · Post by **Nowhk** » Wed Jan 17, 2018 2:02 pm

matt42 wrote:No mutex and no lock-free queue.... Well, sorry they'd be both of my suggestions.

A lock free queue should work fine. I've never used the boost one though.

As for mutex one option on the audio thread is std::mutex::try_lock this way if you get the lock you can update the coefficients, otherwise the code continues and you can try the lock again next call. The audio thread won't block in other words

Awesome! It does the job done easier & faster! Thanks!!!

hugoderwolf · Post by **hugoderwolf** » Wed Jan 17, 2018 3:23 pm

You don't need to update the gui that often anyway. You could just send an update message to the gui like every 20ms. That's ages in DSP, but gives you a still overengineered 50fps. Efficiency doesn't hurt here.

But you really might want to build that lock-free queue and make it a general communication channel from DSP to GUI. You'll very likely need it again. Also, build the equivalent for the other direction as well while you're at it.

Nowhk · Post by **Nowhk** » Wed Jan 17, 2018 3:29 pm

hugoderwolf wrote:You don't need to update the gui that often anyway. You could just send an update message to the gui like every 20ms. That's ages in DSP, but gives you a still overengineered 50fps. Efficiency doesn't hurt here.

I already do this. Sorry to say, but also using 50 FPS and update message, this won't fix the trouble. It could start every 50fps, and while its drawing (in the middle, after a0 and a1 has evalutated for example) switch to audio thread, which will refresh the coefficients. And that's mess up the drawing. Which is what happens here (before using the try-lock).

Nowhk · Post by **Nowhk** » Wed Jan 17, 2018 4:10 pm

I've triumph too early I think, matt42

From cppreference:
This function is allowed to fail spuriously and return false even if the mutex is not currently locked by any other thread.

This means that even if no-one is locking the mutex, sometimes it won't refresh coefficients, because it simply return (spuriously) false, even if it could lock the section and update them...

stratum · Post by **stratum** » Wed Jan 17, 2018 4:26 pm

This means that even if no-one is locking the mutex, sometimes it won't refresh coefficients, because it simply return (spuriously) false, even if it could lock the section and update them...

Doesn't look like a problem.

noizebox · Post by **noizebox** » Wed Jan 17, 2018 4:30 pm

The simplest (and still safe) way to do it, imho would be using a copy and swap construction. You use 2 structs with coefficients and a pointer (or index) to select which one to read from and which one to write to. You can't atomically change the whole struct, as it's too large, but you can atomically change the pointer (or index).

That's basically a lock free queue with a capacity of 1 element though, so If you don't like lock free queues, forget I said that

mystran · Post by **mystran** » Wed Jan 17, 2018 10:28 pm

noizebox wrote:The simplest (and still safe) way to do it, imho would be using a copy and swap construction. You use 2 structs with coefficients and a pointer (or index) to select which one to read from and which one to write to. You can't atomically change the whole struct, as it's too large, but you can atomically change the pointer (or index).

I use a scheme similar to this for moving (non-trivial) data from the GUI thread to the audio thread (eg. the pointer can point to an arbitrarily complex data structure), but when you swap a new pointer in place, you have to keep the data for the previous pointer alive until you know the reader (or all of them if you want multiple) are done and this requires some locking in order to figure out when it's safe to reuse the old object (or free it or whatever you want to do with it).

It can be made wait-free for the reader, since the "read-locking" just needs to do some atomic ops, and the writer can even pass a new pointer to the readers without waiting for anything, but the writer will have to check (at one point or another) whether the old pointer has been released (ie. can be reused or freed) which involves waiting (although you can keep polling at a later date, so you don't necessarily have to block anything for it).

I guess you could make this work with audio writer too if you used at least 3 objects (eg. one to "offer" right now, one to write into and finally one that's waiting to be released because the GUI was using it; two objects is not enough). Then you should be able to always reclaim either the "current offer" or the "waiting to be release" object whenever you swap in new data, although I have to admit I'm not sure how to implement that in practice (never tried, since just using a streaming queue works just as well as is a bit more general; using a 4th object would make it pretty simple though).

For streaming data from the audio to the GUI (which works fine for the OP's use-case), the most obvious thing to do is to use a wait-free producer-consumer queue (a ring-buffer) with two atomic indexes. Basically for either reading or writing you atomically get the "end index" of "your region" then do whatever you want in the space between the "start index" of "your region" and the atomic value you fetched (once at the beginning) and finally bump the "start index" atomically to advance the "end index" as far as the other thread is concerned.

This is fully wait-free and safe (with one reader and one writer at least) because we partition the queue into thread-private regions and only allow each thread to move regions from the beginning of it's own area to the end of the other threads area, but the downside is that you will have to drop new data if the queue is full (because otherwise you'd have to wait).

Unfortunately, most of the time you would rather prefer to drop oldest data pending in the queue rather than the new data you're trying to write and as far as I can tell, this cannot be done in a completely wait-free way directly. What you can do though, is first build a local queue of up-to N latest samples (or data units in general; these need not be atomic values) that you haven't sent yet (discarding older samples once the queue size reaches N). Then you try to flush the whole thing into the actual shared queue in a single operation, but only do it if all the collected items fit (the queue must be able to fit at least N samples obviously, but making it bigger will give you more protection against drop-outs).

This then means that stuff like visualisations will always have at least N latest samples to work with (eg. to do N-point FFT for spectrum plot or something) as soon as we can get new data across again (even if you constantly keep falling behind), but still allows you to update more often if the GUI is fast enough. It will take "one frame" of lag (where we read the stale data that was filling up the queue) to sync up again after the queues fill up, but after that it's current again (unless there's a further queue fill-up, in which case you'd have to sync again, but if this happens constantly it can be fixed by simply making the shared-queue larger).

Obviously if N=1 then your "local queue" doesn't need to be very complicated. In this case you probably want a much bigger shared queue to avoid actually filling it (so you don't need to pay the "one frame lag" for sync-up), but since the reader can trivially discard data from the shared queue without even reading it, making the shared queue bigger is essentially just a matter of some extra memory (so making it large enough to fit multiple seconds of data is usually not a problem).

mystran · Post by **mystran** » Wed Jan 17, 2018 10:40 pm

I'd also like to point out -- since a lot of people suggest "lock-free queues" -- that what you really want is wait-free queue. This difference is very important (and not just semantics), because a "lock-free" algorithm is allowed to wait (eg. busy loop) while a "wait-free" algorithm is allowed to use locks (as long as it never waits for them, so operations like try_wait() are acceptable).

The point of having a "lock-free" algorithm is to avoid locking overhead, while the point of having a "wait-free" algorithm is to avoid violating real-time constraints. In practice the "wait-free" queue I proposed in my previous post is also "lock-free" in the sense that it only uses atomic counters, but you could build a queue using semaphores and still be "wait-free" and safe for real-time too as long as you don't wait for more free space.

matt42 · Post by **matt42** » Thu Jan 18, 2018 3:12 am

mystran wrote:Unfortunately, most of the time you would rather prefer to drop oldest data pending in the queue rather than the new data you're trying to write and as far as I can tell, this cannot be done in a completely wait-free way directly.

It's easy enough for a three item non-locking/waiting, queue/ring buffer. At least for a one producer one consumer situation like Nowhk is talking about.

Edit: more correctly this should get you close enough. If the gui thread is running slowly enough that the audio thread is performing many overwrites then it's running too slowly anyway, so one queue item of latency shouldn't make or break the scheme.

Also perhaps with some modifications the scheme could run so that the GUI always reads the latest update. I have an idea, but would want work it out fully before saying more, plus not sure it'd be really worth it

Sync GUI from Audio Thread: any way to copy bulk of data with one instruction?