Atomic ring/dual buffer implementation?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Hi there,

I'm about to make a basic sampler, which should display the waveform on my own Plot class, processed in the GUI thread.

The samples can be loaded faster, selected by random/cv controller.

In these terms, I need a sort of buffer/structure which I can put in comunication DSP thread (when effectly the sample got the trigger to be loaded) and the GUI thread (which make and draw the waveform).

My idea is:
- load the sample on dsp thread, than create a dto object which contains only the path of the sample, and push to a ring buffer, flagging it as dirty
- on gui thread, check the latest dirty path (if any), load its samples (again) and elaborate the samples, drawing it on a cached panel
- on every gui run, just draw the prev cached panel

Is this a correct scenario?
If so, what I'm looking for is a sort of atomic ring/dual buffer implementation (2 slots should be enough I believe): when I write/read the last pushed element, it must guaranteed the atomicity of the operations (i.e. when I read the x element, I need to write to x+1 only; the same when I write, nobody must start to read the other slot until the writing is finished).

Any already-done implementation out there that I can easily use in c++? Maybe std already have it?

The main request and challenge here is (I bet):
- read only when there are some dirty flag
- ensure to write only on a non-read slot (so overwrite is plausible), and flag dirty at the end
- ensure to read only on a dirty flag, but the problem could be that when I start reading, nobody should write, so maybe 2 slots are not enough...
- all operations lock-free (which is quite mandatory for an audio app)

Thanks for any tips you can give to me :) As usual!!!

Post

you need a "non-blocking FIFO" implementation. With an atomic read pointer and an atomic write pointer.

You don't need a "dirty flag" as such. Because you can observe the read and write pointer to determine what is happening.

For example:
* read ptr same as write ptr - FIFO is empty.
* read ptr "behind" write ptr - FIFO has an element waiting to be read.
* write_ptr just behind read_ptr - FIFO is full, wait for "consumer" to empty it.

more info:
https://en.wikipedia.org/wiki/Circular_buffer

Post

The only reason to use a circular buffer is the safe and sane overwrite possibility that may occur when one thread lags the other. It can't be guaranteed that destructive overwrite will not occur and the result accurate. However you CAN flag a destructive overwrite and present it, so it is known the result is not legitimate.

Post

Jeff McClintock wrote: Mon Jul 08, 2024 2:29 pm you need a "non-blocking FIFO" implementation. With an atomic read pointer and an atomic write pointer.

You don't need a "dirty flag" as such. Because you can observe the read and write pointer to determine what is happening.

For example:
* read ptr same as write ptr - FIFO is empty.
* read ptr "behind" write ptr - FIFO has an element waiting to be read.
* write_ptr just behind read_ptr - FIFO is full, wait for "consumer" to empty it.

more info:
https://en.wikipedia.org/wiki/Circular_buffer
Found this, which seems doing exactly what you are suggesting: https://github.com/VCVRack/Rack/blob/v2 ... er.hpp#L18
(work with a single producer and consumer, which is the case)

camsr wrote: Mon Jul 08, 2024 6:01 pm The only reason to use a circular buffer is the safe and sane overwrite possibility that may occur when one thread lags the other. It can't be guaranteed that destructive overwrite will not occur and the result accurate. However you CAN flag a destructive overwrite and present it, so it is known the result is not legitimate.
Given the example I've posted above, where it can basically fail?
I don't see any critical point honestly if I check empty() before shift(), expect the case the reading for some reasons is very "slow" and writing faster (but in that case, I check with full(), skipping the overwrite).

Or do I miss somethings?

Post

You could adapt (or use as an example) this basic (single writer, single reader, FIFO) queue of mine which is basically designed for precisely this thing (with additional handy feature of "all of nothing" writes directly from a DSP-side ring-buffer; this is to make it easier to gracefully handle overflow in situations where you actually care):
https://github.com/signaldust/dust-tool ... ead.h#L163

The basic idea is very simple: we maintain two indexes (write position, read position, both initialized at the beginning of the queue; these are private to writer and reader respectively so need not be atomic) and a "freeSpace" counter that serves as the shared atomic variable (could equivalently use "usedSpace" instead, doesn't make a difference, we just need some shared counter to track the status of the queue).

Correct synchronization is guaranteed by both thread first fetching the shared "freeSpace" variable atomically (the memory fences force sync across cores and prevent compiler from reordering; it's important we copy to local variable just once). Once we know the state of the queue, we can safely write or read data at our leisure (provided we respect the counter we fetched at the beginning). Once we're done, we'll atomically adjust the "freeSpace" variable so the other thread knows that more data can be written or read.

Note that if you replace the clang atomics (actually I think those are GCC built-ins supported by clang) with something else (eg. std::atomic) these MUST be full fences (well, that's a tiny bit conservative, but better safe than sorry) with respect to all(!) memory operations and not just other atomic operations, because we use them to signal the other thread that we're done reading or writing the actual data.

Post

Derozer wrote: Tue Jul 09, 2024 12:35 am
Given the example I've posted above, where it can basically fail?
I don't see any critical point honestly if I check empty() before shift(), expect the case the reading for some reasons is very "slow" and writing faster (but in that case, I check with full(), skipping the overwrite).

Or do I miss somethings?
The reading thread may be blocked sometimes, yes. GUI thread is typically lower priority than the DSP thread.

Post

mystran wrote: Tue Jul 09, 2024 1:49 pm
Note that if you replace the clang atomics (actually I think those are GCC built-ins supported by clang) with something else (eg. std::atomic) these MUST be full fences (well, that's a tiny bit conservative, but better safe than sorry) with respect to all(!) memory operations and not just other atomic operations, because we use them to signal the other thread that we're done reading or writing the actual data.
Could a double buffer be useful? To only read what was written in full, so the reading thread does not ever attempt to read from a buffer currently being written. This idea doesn't coincide with a circular buffer that may loop around and overwrite an area that was previously pointed to as 'ready' perhaps.

Post

Pretty clear, thanks to all support.

Now, I'm facing another problem :)
When the sample is changed (for common reasons) on gui thread (once I push/buffer the new path), there can be some edge cases where the playing sample on dsp thread crash due to the change of it on another thread under the hood (such as segmentation fault, memory access of different samples, and so on).

I think the only way is:
- manage an opposite buffer (thread to DSP)
- flag it as dirty/tobechanged
- load the wav on a dedicate thread from dsp thread...

... and so on. But I realize it starts to become a pain for a "simply load a waw sample" task :)

Is that so complicated nowdays to do?
Perhaps I've wrong the approch?

I recentely see this "hack" on VCVRack codebase, loading a wave sample from GUI thread: https://github.com/VCVRack/Fundamental/ ... e.hpp#L167

But honestely I don't get it, and I don't think its always consistent? :o

Post

camsr wrote: Wed Jul 10, 2024 3:07 pm
mystran wrote: Tue Jul 09, 2024 1:49 pm
Note that if you replace the clang atomics (actually I think those are GCC built-ins supported by clang) with something else (eg. std::atomic) these MUST be full fences (well, that's a tiny bit conservative, but better safe than sorry) with respect to all(!) memory operations and not just other atomic operations, because we use them to signal the other thread that we're done reading or writing the actual data.
Could a double buffer be useful? To only read what was written in full, so the reading thread does not ever attempt to read from a buffer currently being written. This idea doesn't coincide with a circular buffer that may loop around and overwrite an area that was previously pointed to as 'ready' perhaps.
A double buffer would do nothing. You need two sync points: one at the beginning to fetch the status (so you know how much you can write or read) and one at the end to signal the other thread what we've done. That's not a big deal, we're already as relaxed in terms of sync as possible, it's just that if you have atomic primitives that only guarantee sync with respect to other atomic primitives, then you want to manually add a fence before the atomic add/subtract.

Note that my queue already guarantees that even with concurrent writes and reads, the writer either writes data in full, or fails and the reader either sees full data at once, or it doesn't (at which point it would see it later).

So you can perfectly well use this queue safely declared as uint8_t and stuff variable length messages (say CLAP events or whatever) in there. It never blocks either side, so you can safely use it for DSP to GUI, but also for GUI to DSP. You'll never see a partial message.

Post

Derozer wrote: Thu Jul 11, 2024 8:40 am When the sample is changed (for common reasons) on gui thread (once I push/buffer the new path), there can be some edge cases where the playing sample on dsp thread crash due to the change of it on another thread under the hood (such as segmentation fault, memory access of different samples, and so on).
The RTPointer class just before my RTQueue is designed to deal with this issue... but when I was looking at linking it, I realized there's a potential race with compiler optimizing, so if you want to steal the class, take the version from the dev-branch (adds one more fence, just to be sure): https://github.com/signaldust/dust-tool ... read.h#L76

The way this works: you store a pointer to the shared RTPointer by calling swapAndWait() in the GUI thread. This returns the old pointer once the real-time (DSP) thread is guaranteed not to be holding onto it anymore, so once the function returns, you can delete the returned pointer (we don't want to leak it after all). In the DSP thread, you call rtLock() to obtain the pointer (eg. beginning of process) and then rtRelease() to tell the GUI thread that you're done with it (eg. end of process). Note that this does NOT starve the GUI thread even if we rtLock() again before swapAndWait() has returned, because it doesn't actually wait until the DSP lock isn't holding onto any pointer, it just waits until we know it's not holding onto the old pointer anymore (that's why it uses an incrementing readState).

Note that if you call rtLock() and then swapAndWait() in the same thread, it'll dead-lock, so don't do that... and treat the object (and whatever object graph might be hanging from it) as "read-only" while it's being stored in the RTPointer (ie. do a "copy on write" and construct a new object whenever you need to modify something).

Post

mystran wrote: Thu Jul 11, 2024 12:38 pm
Derozer wrote: Thu Jul 11, 2024 8:40 am When the sample is changed (for common reasons) on gui thread (once I push/buffer the new path), there can be some edge cases where the playing sample on dsp thread crash due to the change of it on another thread under the hood (such as segmentation fault, memory access of different samples, and so on).
The RTPointer class just before my RTQueue is designed to deal with this issue... but when I was looking at linking it, I realized there's a potential race with compiler optimizing, so if you want to steal the class, take the version from the dev-branch (adds one more fence, just to be sure): https://github.com/signaldust/dust-tool ... read.h#L76

The way this works: you store a pointer to the shared RTPointer by calling swapAndWait() in the GUI thread. This returns the old pointer once the real-time (DSP) thread is guaranteed not to be holding onto it anymore, so once the function returns, you can delete the returned pointer (we don't want to leak it after all). In the DSP thread, you call rtLock() to obtain the pointer (eg. beginning of process) and then rtRelease() to tell the GUI thread that you're done with it (eg. end of process). Note that this does NOT starve the GUI thread even if we rtLock() again before swapAndWait() has returned, because it doesn't actually wait until the DSP lock isn't holding onto any pointer, it just waits until we know it's not holding onto the old pointer anymore (that's why it uses an incrementing readState).

Note that if you call rtLock() and then swapAndWait() in the same thread, it'll dead-lock, so don't do that... and treat the object (and whatever object graph might be hanging from it) as "read-only" while it's being stored in the RTPointer (ie. do a "copy on write" and construct a new object whenever you need to modify something).
Your script its about 600 lines of code :) To learn it would be huge task, but I can try. Manu thanks for your gift!!!

Just to have a quick test: do you have a basic pseudo code on which I can see which method to call, where and how? I just need RTPointer I believe...

From your statement, it seems "processing" is being executed on dsp thread: but in case of loading .wav, I dubt DSP should do it, right? Isn't the GUI thread candidate for this task? (since could be heavy...)

Post

Derozer wrote: Thu Jul 11, 2024 12:56 pm Your script its about 600 lines of code :) To learn it would be huge task, but I can try. Manu thanks for your gift!!!

Just to have a quick test: do you have a basic pseudo code on which I can see which method to call, where and how? I just need RTPointer I believe...
You should be able to take both the RTPointer and RTQueue classes (edit: they are fully in the header) as-is without taking anything else (well, except the memfence() wrappers at the top of the file). I tend to prefer grouping related stuff into the same header rather than having a separate one for every single thing, but those don't depend on anything else (well, except some standard library #includes, but figuring those out shouldn't be too difficult).
From your statement, it seems "processing" is being executed on dsp thread: but in case of loading .wav, I dubt DSP should do it, right? Isn't the GUI thread candidate for this task? (since could be heavy...)
By "process" I mean your ordinary plugin process() function that does all the audio processing.

Code: Select all

// declare somewhere accessible to both threads (eg. your plugin class)
RTPointer<Foobar> shared_foobar;

// In GUI thread
Foobar * newFoobar = new Foobar(whatever); // construct a new object
Foobar * oldFoobar = shared_foobar.swapAndWait(newFoobar);
if(oldFoobar) delete oldFoobar;

// In DSP thread
Foobar * foobar = shared_foobar.rtLock();
if(foobar)
{
  /* you can access the contents of foobar here */
}
shared_foobar.rtRelease(); // let GUI thread know we don't need it anymore

// when the plugin is closed, in GUI thread
Foobar * oldFoobar = shared_foobar.swapAndWait(0);
if(oldFoobar) delete oldFoobar;
So in the case of samples, you'd load the sample, put it into a new object, then swap. Then in the DSP thread you fetch the pointer, read the sample contents, release the pointer.

Post

mystran wrote: Thu Jul 11, 2024 1:14 pm So in the case of samples, you'd load the sample, put it into a new object, then swap. Then in the DSP thread you fetch the pointer, read the sample contents, release the pointer.
Tried, but it seems to doesn't work as expected.

I load the sample on GUI thread (once):

Code: Select all

void WavePlot::onButton(const event::Button &e) {
	DEBUG("step1");
	Foobar *newFoobar = new Foobar(); // construct a new object
	Foobar *oldFoobar = pWave->shared_foobar.swapAndWait(newFoobar);
	if (oldFoobar) {
		DEBUG("step2");
		delete oldFoobar;
	}
	DEBUG("step3");
}
than on DSP thread, I constantly check for new content to read:

Code: Select all

Foobar *foobar = shared_foobar.rtLock();
if (foobar) {
	DEBUG("updateFromGUI");
}
shared_foobar.rtRelease(); // let GUI thread know we don't need it anymore
It prints:

Code: Select all

DEBUG("step1");
DEBUG("step3");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
... (infinitely).
And that's the problem: it should read "once", not continuously. But of course it does this: https://github.com/signaldust/dust-tool ... read.h#L99

the ptr is always the same... nobody invalidate it.

Should I delete the *foobar pointer after copy/manage it on /* you can access the contents of foobar here */ section?
But I'm not sure this is good, since dealing with memory (new, delete, etc) on audio thread is always evil...

Or maybe is intended to constantly write (gui) and read (dsp) and not only "once" on gui? Tried, but it still print somethings like:

Code: Select all

DEBUG("step1");
DEBUG("step3");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("step1");
DEBUG("step2");
DEBUG("step3");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
DEBUG("updateFromGUI");
...
which is still weird...

Post

Derozer wrote: Fri Jul 12, 2024 12:04 am
mystran wrote: Thu Jul 11, 2024 1:14 pm So in the case of samples, you'd load the sample, put it into a new object, then swap. Then in the DSP thread you fetch the pointer, read the sample contents, release the pointer.
Tried, but it seems to doesn't work as expected.
Either I misunderstood what you want, or you misunderstood the solution.

Suppose you have some structure representing sample, perhaps it contains some header data (length, loop points, etc) and then perhaps an std::vector holding the actual samples. We want to access this in the DSP thread in order to actually do the playback... yet we have a problem, because if we were to load a new sample (perhaps reallocate the vector, etc) in the GUI thread while the DSP thread is accessing the previous one, we'll end up with random crashes, because from the point of view of the DSP thread the memory contents change in unexpected ways.

RTPointer then is essentially a container for a pointer that solves this by allowing the GUI thread to allocate an object and then safely share it with the DSP thread. When DSP thread calls rtLock() that basically means "I'm accessing the object stored in the RTPointer right now, don't change it on me" and then rtRelease() says "ok, I'm done with it for now." This way your DSP thread only needs to prepare for the data having changed at the one point where it calls rtLock(). Dealing with "we might get the same or a different pointer here" is fairly easy, where as "anything can change randomly at any point" is basically impossible to work with.

If you want to send a message, then use a queue. If don't need a message, but still want to detect in the DSP thread whether some data shared with RTPointer is still the same as last time (eg. perhaps you want to reset all existing voices when data changes), then put an incrementing "generation" counter into the object itself (ie. every time you construct a new object, you give it the next increment from the counter).
But I'm not sure this is good, since dealing with memory (new, delete, etc) on audio thread is always evil...
No... the whole point of this class is that it allows you to allocate (and also deallocate) objects in the GUI thread while still allowing the DSP thread to safely access to those objects.

Post

mystran wrote: Fri Jul 12, 2024 4:12 amput an incrementing "generation" counter into the object itself (ie. every time you construct a new object, you give it the next increment from the counter).
Yes, that's seems to be the way to act with this mechanism.
I've done somethings like this:

Code: Select all

// gui
Foobar *newFoobar = new Foobar(); // construct a new object
*newFoobar = mWave.mData;
newFoobar->mDirtyGUI = true;
...

// dsp
Foobar *foobar = shared_foobar.rtLock();
if (foobar) {
	if (foobar->mDirtyGUI) {
		mData = *foobar;
		foobar->mDirtyGUI = false;
	}
}
shared_foobar.rtRelease(); // let GUI thread know we don't need it anymore
Seems to react very well :) Doint some more test... and back to you.
Thanks for now, very nice approch.

Return to “DSP and Plugin Development”