KVR Audio

mystran · Post by **mystran** » Mon Sep 16, 2019 5:39 pm

camsr wrote: ↑Mon Sep 16, 2019 7:29 am Edit: Syncing the threads is a pain.

Don't sync your threads, your audio thread can never wait for a synchronization object anyway.

What you want to do is use wait-free structures (eg. message queues) to send the data around.

camsr · Post by **camsr** » Mon Sep 16, 2019 8:21 pm

There seems to be a contention no matter what I try. To counter it I made the copy operation as localized as possible, and set a flag that indicates to the paint thread to spin while it's being copied. But it's still not 100% guaranteed to grab the correct values... (because of the spin flag not being synchronized with anything? The copy op in process is not blocked.)

Code: Select all

// in process

volatile size_t* ptr_ring_csr_proc = &(disp->ring_csr_proc); // pointer to volatile value
volatile size_t* ptr_flag_copy = &(disp->flag_copy);
const size_t local_ring_csr = *ptr_ring_csr_proc; // hopefully this loads the volatile value as a local copy?
// barriers here??

*ptr_flag_copy = 1; // pointer to volatile value
 ... memcpy using local_ring_csr ...
*ptr_ring_csr_proc++;
*ptr_ring_csr_proc %= RING_SIZE;
*ptr_flag_copy = 0;

// in paint

volatile size_t* ptr_flag_copy = &(disp->flag_copy);
	
// this has to be guaranteed to execute at only this position in the function
COMPILER_BARRIER();
while(*ptr_flag_copy); // blocking loop, flag set in run_processes
COMPILER_BARRIER();

// then make access to memory with these cursors
volatile size_t* ptr_ring_csr_proc = &(disp->ring_csr_proc);
volatile size_t* ptr_ring_csr_disp = &(disp->ring_csr_disp);
const size_t proc = *ptr_ring_csr_proc;
const size_t disp = *ptr_ring_csr_disp;

mystran · Post by **mystran** » Mon Sep 16, 2019 8:58 pm

camsr wrote: ↑Mon Sep 16, 2019 8:21 pm There seems to be a contention no matter what I try. To counter it I made the copy operation as localized as possible, and set a flag that indicates to the paint thread to spin while it's being copied. But it's still not 100% guaranteed to grab the correct values... (because of the spin flag not being synchronized with anything? The copy op in process is not blocked.)

First, using "volatile" on modern compilers (for anything other than MMIO access) is asking for trouble, since it almost certainly won't do what you want. You really want to use "compiler fences" which on x86 don't actually cause any instructions, but instruct the compilers optimiser that you really need a particular memory ordering.

Also spinning in paint is probably not a good idea either. You really want some sort of queue, where the processing code can place new values and the painting code fetch them, in logically atomic chunks. Unfortunately, the topic of correctly implementing wait-free queues is a little too much for me to include here right now, so I'd advice you to just find a known-good implementation somewhere.

ps. I'd also forget the idea of getting "latest" updates and just settle on getting "consistent recent update" because the latter is both good enough and actually obtainable.

camsr · Post by **camsr** » Mon Sep 16, 2019 9:22 pm

Having the latest available updates is important, but it's also a point of contention.
The way I am approaching the updates is basically defined by this truth condition:
( number of times process ran > 0 && number of times paint ran > 0 )
This allows the threads to run free while also insuring the fastest possible update rate.
The contention is in "syncing" the controlling variables.

mystran · Post by **mystran** » Mon Sep 16, 2019 9:50 pm

Using a proper queue, you will always have either the latest update or the one just before that. This is almost always enough.

BertKoor · Post by **BertKoor** » Tue Sep 17, 2019 7:57 am

Fact 1: your Audio thread is continuously delivering audio packages at a rate of sample_rate / buffer_size, say 48.000 / 64 = 750 "frames" per second.

Fact 2: your GUI thread will update the screen maybe 60 or 100 times per second.

Derived from that: each GUI update the Audio thread has produced 7 or 8 (maybe some dozens, not hundreds) data packages it might have to reflect on the screen. So the GUI has a similar latency as the audio itself.

Question for you: is it really problematic for the GUI thread to process 7 or 8 queued data packages delivered by the audio thread? Or did I not understand the underlying problem?

camsr · Post by **camsr** » Tue Sep 17, 2019 5:13 pm

BertKoor wrote: ↑Tue Sep 17, 2019 7:57 am Fact 1: your Audio thread is continuously delivering audio packages at a rate of sample_rate / buffer_size, say 48.000 / 64 = 750 "frames" per second.

Fact 2: your GUI thread will update the screen maybe 60 or 100 times per second.

Derived from that: each GUI update the Audio thread has produced 7 or 8 (maybe some dozens, not hundreds) data packages it might have to reflect on the screen. So the GUI has a similar latency as the audio itself.

Question for you: is it really problematic for the GUI thread to process 7 or 8 queued data packages delivered by the audio thread? Or did I not understand the underlying problem?

Not at all. The problem is how to synchronize all the events in a meaningful way.
As I am just fleshing stuff out right now, I wanted to see how fast I could make the display updates (based on host buffer size and idle rate). To test that I am calculating an average of the peak levels in the audio process, then I am only sending the average (not audio samples) to the paint process, where the averages are further summed and averaged again based on the count of calls made between paint or process. If there are more paint operations than process operations, the paint call does not update until it gets a process call flag. If there are more process calls than paint, the average values have to be divided by the number of process calls made. At larger host buffer setting, the display updates less frequently with a larger bin for the average (the average is the result of many samples rather than few), and with smaller setting it updates more frequently. But there is terrible jitter also

And yes I do understand what is technically wrong with only passing the average but that's not important at this moment.

What is important is to not block the audio thread. Maybe it is better to just copy the audio samples out of the audio process for paint to use, but if the display starts becoming more involved in different points of the audio process (for example, graph input AND output, etc.) the amount of memory being copied could be a bottleneck. So it may as well do all it's display calculations in the audio process, especially for statistical results like max() and avg(). Then the problem becomes granularity, because there may be a display value update that is the max of 100 samples, and the next is 2000, and it is "unbreakable" because the initial variables of the calculation simply don't exist. Fortunately there is a middle-ground, and it requires calculating the smaller results more frequently in the audio thread. Then they must be aggregated in the paint process to provide meaningful data. Having these small statistical results based on a fixed time step (number of samples) is a much better alternative, but it does not completely eliminate jitter.

This way, the producer is more frequent, and does not block. The granularity is fixed and small (with larger footprint overall). The paint process has more recent data (but in actuality there will need to be delays, as mystran said). Unfortunately the real sync hurdle is the display device itself.

mystran · Post by **mystran** » Tue Sep 17, 2019 11:20 pm

Oh right.. so using a proper message queue you COULD solve the exact problem you are trying solve, but I can't help but wonder if it's actually worth solving.

Typically when you are showing a level meter of any type, you really don't want to tie to the GUI update rate. Rather you should typically maintain a moving average (whether boxcar or exponential) where you choose the size of the window based on the amount of smoothing you want (and please note that this is generally a "human factors" decision really). This moving average is "continuous" in the sense that it typically updates on per-sample basis. Then normally you should send only the current value to the GUI thread at the end of each processing block as a "snapshot" and then your paint routine can simply draw this. This way your smoothing is independent of both your audio processing block-rate and your GUI update rate. Whichever is lower sets the actual visual update rate that you can have, but there's no way around that really.

In contrast, if you try to average varying sized blocks you'll just end up with wildly inconsistent smoothing, to the point where the resulting value are basically useless for anything. Especially in a host like FL where blocks can be broken into smaller blocks, you might be drawing an average of 2 values at one time and then an average of 200 on the next. While it can be technically done, the result won't really be useful for anything.

In general, what I'd suggest is that whenever you want to draw visuals, you should think of the actual visual updates as snapshot images of a continuous process at some instants in time. As it turns out, this also turns the problem into a trivial one, since now you only need to send atomic values (ie. the current level) which can be done simply by putting memory barriers (compiler barriers are enough of x86) around the loads/stores... and even better the consistent smoothing means that slight jitter in which ever value you happen to pick isn't typically even visually noticeable, unless the audio update rate is very low.

2DaT · Post by **2DaT** » Wed Sep 18, 2019 2:56 am

mystran wrote: ↑Tue Sep 17, 2019 11:20 pmCompiler barriers are enough of x86 around the loads/stores...

I think it's still mandatory to use std::atomic to get the necessary behavior without having to resort to compiler dependent mechanism (_ReadWriteBarrier is actually deprecated). Though atomic stores with default memory order may be costly on x64, acquire and release semantics are free (and so is atomic load with std::memory_order_acquire and store with std::memory_order_release).

camsr · Post by **camsr** » Thu Sep 19, 2019 5:18 pm

Next question, is it feasible to write-back the pointer that is passed to process()? Example:

Code: Select all

process(float** in, float** out, vstint32 frames)
{
memcpy(a_plugin_alloced_buffer, in[0], sizeof(float)*frames);
out[0]  = a_plugin_alloced_buffer;
}

The idea is it may help improve cache locality, but it seems infeasible for many reasons.

Max M. · Post by **Max M.** » Thu Sep 19, 2019 6:33 pm

out[0] = a_plugin_alloced_buffer;

Mmm, no. Curiously there's no (AFAIR) statement in VST2 SDK that says "you should not do that", but in fact, no, you should not do that.
It's assumed you change only the content of the output buffers but not the buffers themselves (if you do this a host at best will ignore your new buffer and at worst you'll just crash it). In general treat `out` as constant pointer to a constant pointer(s) to a non-constant memory.

... it may help improve cache locality ...

(Aside of above) I wonder if it may actually (unless you're expecting your plugin to be the only plugin run by a host - and even in this case you'll actually find that in some (many?) hosts input and output buffers of processReplacing are actually the same buffers so by throwing in another set of buffers you'd only make cache things worse).

camsr · Post by **camsr** » Fri Sep 20, 2019 8:54 pm

I would like my plugin to allocate a block of memory based on a dispatch to effSetBlockSize. Where is the suggested function to do this allocation?

Max M. · Post by **Max M.** » Fri Sep 20, 2019 10:06 pm

setBlockSize
(You'll find most of dispatch mappings in AudioEffect::dispatcher at audioeffect.cpp. The naming pattern is the same: effXYZ -> xYZ).

camsr · Post by **camsr** » Fri Sep 20, 2019 10:26 pm

The host is calling setSampleRate and setBlockSize before the call to effOpen. I was going to assign the returns to my allocation (made in effOpen) but it seems now I have to do something else. I am obviously trying to limit the amount of code in VSTPluginMain as this will speed up plugin scans hopefully. Did I miss something about allocations of the AEffect.user struct need to be made in VSTPluginMain, or is the host just in bad form? (or am I in bad form by saving these returns per plugin instance?)

Max M. · Post by **Max M.** » Fri Sep 20, 2019 10:42 pm

The host is calling setSampleRate and setBlockSize before the call to effOpen. ... or is the host just in bad form?

Honestly if I'd face such thing the first thing to suspect would be my logging code and not the form of a host.
Either way viewtopic.php?t=495889 ?

Random questions from me about vst