Sync GUI from Audio Thread: any way to copy bulk of data with one instruction?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

mystran wrote:What you are supposed to do is copy the coefficients from the audio data to a shared buffer in the audio thread if you can get the lock without waiting. You are then supposed to copy those coefficients from the shared buffer into a GUI-side buffer in the GUI thread so that you don't need to hold the lock while you draw.
So, make a shared buffer between Audio and GUI, which will work this way.

Audio thread try to copy coeffs to shared buffer:
- if shared buffer is locked (i.e. GUI thread is copying values from shared buffer) skip and go ahead;
- else lock, copy, unlock and go ahead;

GUI thread try to copy coeffs from shared buffer:
- if shared buffer is locked (i.e. Audio thread is copying value to shared buffer) block (and wait), than lock, copy, unlock and draw;
- else lock, copy, unlock and draw;

Right?

Post

Oh dear... Currently, I just clamp my data to keep it in range and draw.

I've tried various flags, locks, semaphores, muteces, etc., but they all just want to block the audio thread or block the GUI thread so things don't get accomplished on a timely basis.

But this has inspired me to try a different tack since I've updated my GUI code to use OpenGL.

Edit: nope. Still same issues.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

Nowhk wrote: Audio thread try to copy coeffs to shared buffer:
- if shared buffer is locked (i.e. GUI thread is copying values from shared buffer) skip and go ahead;
- else lock, copy, unlock and go ahead;

GUI thread try to copy coeffs from shared buffer:
- if shared buffer is locked (i.e. Audio thread is copying value to shared buffer) block (and wait), than lock, copy, unlock and draw;
- else lock, copy, unlock and draw;
Yup. This way the chance that the audio thread can't get the lock is as small as possible and missing updates should be very rare even in "worst-case" conditions. It can still happen once in a while (just like winning lottery; this lottery is held a LOT more often though), but if you make a debug build that prints a message when the trylock() fails you'll probably have to wait for quite a while.

Post

syntonica wrote: I've tried various flags, locks, semaphores, muteces, etc., but they all just want to block the audio thread or block the GUI thread so things don't get accomplished on a timely basis.

But this has inspired me to try a different tack since I've updated my GUI code to use OpenGL.
Try the wait-free producer-consumer queue I suggested above. That thing literally needs one load and one store with memory fences (on cache coherent architectures like x86/x64 all you need is a compiler fence to prevent reordering) and that's it. There's literally nothing that could block and the only way you'll miss data is if you fill the queue (which can be made essentially irrelevant simply by increasing the queue size; eg. if you have 10 seconds worth of queue, then your GUI thread would have to freeze for 10 seconds before the queue is full).

Post

Thanks to all, as usual. I'll work it out ;)

Post

mystran wrote:
syntonica wrote: I've tried various flags, locks, semaphores, muteces, etc., but they all just want to block the audio thread or block the GUI thread so things don't get accomplished on a timely basis.

But this has inspired me to try a different tack since I've updated my GUI code to use OpenGL.
Try the wait-free producer-consumer queue I suggested above. That thing literally needs one load and one store with memory fences (on cache coherent architectures like x86/x64 all you need is a compiler fence to prevent reordering) and that's it. There's literally nothing that could block and the only way you'll miss data is if you fill the queue (which can be made essentially irrelevant simply by increasing the queue size; eg. if you have 10 seconds worth of queue, then your GUI thread would have to freeze for 10 seconds before the queue is full).
Thanks, but both the audio and the graphics routines are both very time dependent, so I think the best solution will be to have the audio push the needed values to the graphics routines after they are properly normalized so I can get rid of the ugly clamping routine which also currently has to check for NANs, which is probably much slower than the extra copies AND small loop that are needed. Of course, the solution is always to throw more memory at the problem. :roll:

Edit: did just this and looks and works far better than what I had before with no extra CPU needed. Yay! :lol:
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

Coming in late here, but I have a different approach.

Why don't you pass the cutoff, q and gain back into the UI thread instead of the filter coefficients?
When passing filter coefficients, it's a problem if you only get some of the updates that could create a really strange filter response.
With cutoff/q/gain it doesn't matter if you don't get an update on time. You just compute the biquad coefficients again on the UI thread from the audio thread's cutoff/q/gain and your response will look *pretty much* correct.

Then, float copying is atomic so you don't have to have a mutex or lock-free queue.

If I'm wrong I'd like to know because that's what I'm doing in my current project.

Post

mtytel wrote: If I'm wrong I'd like to know because that's what I'm doing in my current project.
You're not doing anything wrong. You're just thinking outside of the box.

I have to admit, now that I think about it that's what I'd normally do with things like filter response plots as well. :P

Post

Depends on how efficient your coefficient calculation routine is. If you have a complicated algorithm, this may take too much time.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

syntonica wrote:Depends on how efficient your coefficient calculation routine is. If you have a complicated algorithm, this may take too much time.
Right... except anything that's fast enough to do in the audio thread on the fly is probably rather irrelevant as far as GUI thread goes, since your audio thread is usually the one that needs to work with tighter schedules and in most cases you're likely to spend at least an order of magnitude more CPU on actually drawing the results anyway.

Post

mystran wrote:
syntonica wrote:Depends on how efficient your coefficient calculation routine is. If you have a complicated algorithm, this may take too much time.
Right... except anything that's fast enough to do in the audio thread on the fly is probably rather irrelevant as far as GUI thread goes, since your audio thread is usually the one that needs to work with tighter schedules and in most cases you're likely to spend at least an order of magnitude more CPU on actually drawing the results anyway.
I'm just concerned with the CPU hit, not the fps, per se. I see a number of GUIs from professional plugins that can eat a very good chunk of CPU when open. Often using more than the sound engine itself. :?
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

I'm just concerned with the CPU hit, not the fps, per se. I see a number of GUIs from professional plugins that can eat a very good chunk of CPU when open. Often using more than the sound engine itself. :?
In the past one used to draw only when operating system demanded doing so. That could be trigged by the applicaiton by a call like InvalidateRect, which would request the operating system to initiate window refresh but that did not mean that it would happen immediately. This is the way Win32/GDI used to work and Win32 API still retains this logic, albeit nowadays one can also use OpenGL or DirectX inside the paint message handler. There is also a 2D replacement for GDI called Direct2D.

There is another style, one commonly used in games, which involves repainting at a fixed rate, even when there is nothing that needs to be refreshed. If this style is used together with software-only rendering, it will cause unnecessarily high CPU usage, because if somebody is doing game-style programming, chances are high he is doing it with a 3D-drawing API which introduces unnecessary overhead unless a suitable GPU driver is present.
~stratum~

Post

syntonica wrote: I'm just concerned with the CPU hit, not the fps, per se. I see a number of GUIs from professional plugins that can eat a very good chunk of CPU when open. Often using more than the sound engine itself. :?
We're just talking about computing biquad filter coefficients once per visual frame (60fps). It'll have a negligible CPU usage impact.

The reason most professional plugins use so much CPU with their UI is because they have animations but use software renderers to draw the lines/shapes (e.g. Serum I believe). Pushing pixels from the CPU is soooo slow.

As a side note, these CPU intense GUIs could also be using an OpenGL renderer but copying a lot of memory across.

Post

mtytel wrote:We're just talking about computing biquad filter coefficients once per visual frame (60fps). It'll have a negligible CPU usage impact.

The reason most professional plugins use so much CPU with their UI is because they have animations but use software renderers to draw the lines/shapes (e.g. Serum I believe). Pushing pixels from the CPU is soooo slow.
Are you saying that "calculate biquad filter coefficients AND draw with software renderers" is cheap and "draw lines/shapes with software renderers" is expensive?

Not sure if I got the point. In both case you draw pixels.
Using somethings like LICE or cairo (within IPlug for example) is quite faster.

Or what do you mean with software renderers?

Post

Nowhk wrote: Are you saying that "calculate biquad filter coefficients AND draw with software renderers" is cheap and "draw lines/shapes with software renderers" is expensive?

Not sure if I got the point. In both case you draw pixels.
Using somethings like LICE or cairo (within IPlug for example) is quite faster.

Or what do you mean with software renderers?
I'm not sure why but I thought I read you were using OpenGL. My mistake. Using a software renderer is the expensive part. Computing filter coefficients and line points is cheap.

A software renderer basically draws pixels on the CPU. OpenGL on the other hand uses the graphics card to draw each pixel. Lots of UI libraries have both software renderer and OpenGL backends to choose from but even using the OpenGL backend sometimes won't get you that much faster if you're copying a lot of memory (e.g. line points) over to the GPU.

So say I want to draw a filter response. I first compute the filter coefficients which is super cheap. Then, I have to draw the line.

One option is to compute all the points on the UI thread and pass it to some UI library. In this case if the library is using a software renderer, it is slow because you're drawing pixels on the CPU. If the library is using an OpenGL backend, it's faster but still could be slow because copying memory over to the GPU takes time.

A more efficient option is to pass only the filter coefficients over to a vertex shader in OpenGL. This vertex shader basically moves vertices around and so can compute our line points for us. This might be overkill for most developers, and I'm not aware of any apps that actually do it this way (except for mine :P).

Post Reply

Return to “DSP and Plugin Development”