KVR Audio

nollock · Post by **nollock** » Mon Oct 08, 2007 4:24 pm

Resplendence wrote:
For instance the following code from the sineSynth sample accesses the voice List:

for j:=0 to sampleframes-1 do
for i:=0 to Voices.Count-1
do outputs[0,j]:=outputs[0,j]+Voices.Process;

What is not taken into consideration here is thread safety, voice are concurrently added and deleted as a response to midi events. This creates a race condition. It is possible that a voice gets deleted just while in the process path. This means your are calling voices.Process for a just deleted voice object.

In that case all voices should be pre-alocated. And just have a 'active' flag.

Code: Select all

 for j:=0 to sampleframes-1 do
  for i:=0 to Voices.Count-1 do
    if (voices[i].active) then
      outputs[0,j]:=outputs[0,j]+Voices[i].Process;

At least imo.

The solution to this is to use a synchronization object such as a mutex and wrap it around every access (both read and write) to the voiceList.

I think is a good solution to the problem. This lock will be needed every time you read or write to the voice list. There may be other issues which need consideration for thread safety that I will reflect on later.

You can do this in a way which wont block the audio thread. By having two sets of data, the current data in use, and new data in waiting. Basicly the audio thread trys to get a lock, but if it cant it carries on with its current data. If it does get a lock then it checks for the updated data. That way the audio thread can never be blocked by another thread. The other thread waits for a lock, and once locked it sets the updated data and a flag saying it is waiting.

Code: Select all

audio thread:

if (csection.TryLock) then
    begin
    if (updateWaiting) then currentData = UpdatedData;
    updateWaiting:= FALSE;
    csection.Exit();
    end;

gui thread:

csection.Enter();
UpdatedData = ....;
updateWaiting := TRUE;
csection.Exit();

The worst case scenario is that an update might get delayed by however often the audio thread checks for updates.

nollock · Post by **nollock** » Mon Oct 08, 2007 4:29 pm

Resplendence wrote:Theoretically speaking, this design is flawed if your FIFO does not have any lock synchronization meachanism. You must make sure that read and write access to your FIFO happens atomically, which means a lock.

There are such things as lock-free FIFOs.. By that, they are thread safe without requiring locks.

But I dont know how they work tbh.

nollock · Post by **nollock** » Mon Oct 08, 2007 4:37 pm

JCJR wrote: That way, any MIDI events which arrive during a buffer render, will get picked up and processed in the next buffer render?

If you want to preserve timing acuracy then all midi notes have to be delayed by the size of one buffer. And then they have to be positioned in the next buffer by the delta time of their occurance in the previous time slice.

Christian Budde · Post by **Christian Budde** » Mon Oct 08, 2007 4:47 pm

I just had a brief look at the posts. In theory you are correct. I've learned to use mutex at the university here, but haven't implemented them, because the examples are only examples and they worked in every host I had here (even on my Core 2 Duo).

But you're always welcome to donate some work for the project, if you like.

Kind regards,

Christian (currently busy with real, analogue room acoustics)

JCJR · Post by **JCJR** » Mon Oct 08, 2007 6:16 pm

Thanks Daniel

I'm not smart enough to know everything that can go wrong. But the kind of lock-free FIFO's I use for MIDI data, have been using them successfully since Commodore 64 and Mac 512 MIDI days, years before I'd heard the concept of a lock. Old microcomputer systems didn't have threads, but they did have interrupts, which could 'jump in ontop of each other' at completely unpredictable times, which posed about the same danger for corruption as threads can present nowadays.

It is very simple-- An array block of data is accessed by pointers for Bottom, Top, Head, and Tail.

If the Head or Tail hit the Top, they reset to the Bottom and keep on truckin.

I self-impose coding rules. Only one thing in the entire code is allowed to ever update the Head pointer, but everything else can read the Head pointer. Only one thing in the code is allowed to update the Tail pointer, but everything else can read the Tail pointer.

For instance, if MIDI comes in from a keyboard, the MIDI receive writes the data thru the Head pointer, and then the last step is to increment the Head pointer. So if some other function jumps in ontop of this MIDI receive function, the interrupting code doesn't know about that new MIDI message 'in the process of being written', until after the MIDi receive function starts up again and completes that last line which writes the updated Head pointer.

Surely there are opportunities for some odd collision to happen somewhere some time, but as you say, in practice it is extremely unlikely. The functions which mess with the pointers are designed as fast/minimal as possible, to not gum up the works.

If the head ever laps around and hits the tail, everything hits the fan. That is avoided by making the data block big enough, and maintaing the Tail as frequently as possible.

JCJR · Post by **JCJR** » Mon Oct 08, 2007 7:09 pm

nollock wrote:
JCJR wrote: That way, any MIDI events which arrive during a buffer render, will get picked up and processed in the next buffer render?
If you want to preserve timing acuracy then all midi notes have to be delayed by the size of one buffer. And then they have to be positioned in the next buffer by the delta time of their occurance in the previous time slice.

Hi nollock

That situation is low-level jitter. Dunno if it is desirable to delay everything enough to TRY avoiding the jitter. A judgement call.

Maybe there are too many things jittering. The cpu might not be very busy on one audio callback, and the callback gets called very soon after a soundcard low level buffer comes free. On the 'leading edge' of a render cycle.

Maybe next time the cpu is so busy that the audio callback is called so near the 'trailing edge' of a render cycle, that you will be lucky to render in time to avoid a dropout.

MIDI from pre-recorded tracks always lines up correctly, because the prerecorded MIDI will predictably line up against rendered samples regardless or whether the audio buffer actually gets rendered a little early or a little late because of cpu load variance.

But if the user whacks a key, that Midi event will jitter-percolate thru the system and arrive to get rendered. If the MIDI event doesn't make it intime for one render, it gets handled by the next render. The host program doesn't have any control of the jitter of when the audio callback gets called, and it doesn't have any control of when the user decides to play a key, and it doesn't have any control over jitter in the MIDI driver.

So one might spend a lot of cpu cycles trying to compensate for multiple jittters beyond yer control, and only get marginally better results? Dunno.

My hosting breaks up big soundcard buffers into smaller sub-renders. For instance, if the sound system is using a big buffer size, and my code decides the buffer is too big and decides to render that big buffer in 10 little pieces. If a MIDI receive event arrives between the 3rd and 4th sub-render, then that MIDI event would get picked off and rendered in the middle of the current big buffer, rather than having to wait for the next big buffer.

But that doesn't really help much, because for instance if the big buffers are getting rendered in every 20 ms, but it only takes 0.25 ms to render the entire 20 ms, the entire render is still 'bunched up' at some jittery small time location in each 20 ms window, regardless if I do 10 smaller renders within that 0.25 ms.

One could try using close timing, to spread out the sub-renders equally spaced in the time occupied by a big soundcard buffer. That way MIDI Thru input might get placed closer to the desired location. I'd be afraid that this would be difficult and risk dropouts.

I'm not expert on the best way to do it. IMO, you get the best thru results just by setting the soundcard latency as low as practical. Smaller buffers will always have smaller max jitter (as long as you avoid dropouts).

Hardware synths typically have latency > 3 ms. Often much higher than 3 ms. And hardware synths typically have some amount of 'random' latency jitter. It doesn't seem insurmountable to get softsynth latency/jitter in the same ballpark as hardware synths. Most musicians don't notice or complain about minor latency/jitter in hardware synths. I think the ears and fingers just automatically compensate.

Resplendence · Post by **Resplendence** » Tue Oct 09, 2007 11:15 am

[quote="nollock]
In that case all voices should be pre-alocated. And just have a 'active' flag.
[/quote]

This may be a good idea because you limit the amount of data that needs to be protected, but you will still need to synchronize access to the 'active' flag. This can still be read and written simulaneously and you need to prevent this possiblity.

You can do this in a way which wont block the audio thread. By having two sets of data, the current data in use, and new data in waiting. Basicly the audio thread trys to get a lock, but if it cant it carries on with its current data. If it does get a lock then it checks for the updated data. That way the audio thread can never be blocked by another thread. The other thread waits for a lock, and once locked it sets the updated data and a flag saying it is waiting.

Yes this could work but you then it requires the administration of multiple sets of data.

Although a mutex solves the problem, I admit it comes with a heavy price. It is not suited for this particular purpose because the whole audio path gets seialized, only one audio thread can execute at a time. That's a waste of all the extra CPUs you may have. It also adds the additional price of swtiching from usermode to kernelmode. What is a better solution is a read/write lock, multiple threads are allowed to read the data simultaneously, only one thread at a time is allowed to write to it. I don't believe Windows offers a standard read/write lock in usermode. Delphi has a TMultiReadExclusiveWriteSynchronizer class, I have never used it, I will look into that otherwise I will sketch a lock myself for this purpose.

/Daniel

nollock · Post by **nollock** » Tue Oct 09, 2007 1:08 pm

Resplendence wrote: This may be a good idea because you limit the amount of data that needs to be protected, but you will still need to synchronize access to the 'active' flag. This can still be read and written simulaneously and you need to prevent this possiblity.

You dont need to syncronize every single object that can be simultaneously accessed.

In this case as long as the voice object always holds valid data then the worst that can happen is that the voice runs a few samples longer or shorter than would otherwise happen, or with older values, than if it was locked while being edited.

If the voice had a wavetable, pointer and length, then you would need to lock while changing those as any discrepancy between the len & ptr could cause an av fault.

But in my experience the vast majority of parameters dont need to be locked when they are being changed. At least in terms of audio stuff.

Yes this could work but you then it requires the administration of multiple sets of data.

Yeah it is some extra work. And it could quickly get irritating if you had to do this with everything. I use it mainly for memory management... the gui thread keeps a list of objects to be freed, and only frees them once the audio thread has marked them as no longer in use.

Although a mutex solves the problem, I admit it comes with a heavy price. It is not suited for this particular purpose because the whole audio path gets seialized, only one audio thread can execute at a time.

Huh? It would only do that if you had a static/global mutex, which wouldnt make much sense anyway.

The mutex should be tied to the synth / effect, instance data.

That's a waste of all the extra CPUs you may have. It also adds the additional price of swtiching from usermode to kernelmode.

CriticalSections on windows dont cause a mode switch. They use one of the atomic lock instructions.

Not sure about mutexs though.

What is a better solution is a read/write lock, multiple threads are allowed to read the data simultaneously, only one thread at a time is allowed to write to it. I don't believe Windows offers a standard read/write lock in usermode. Delphi has a TMultiReadExclusiveWriteSynchronizer class, I have never used it, I will look into that otherwise I will sketch a lock myself for this purpose.

Im not sure how that would help with a 2 thread audio/gui situation. Realy there's only one reader and writer anyway, so i dont see how multiple readers would help.

Resplendence · Post by **Resplendence** » Tue Oct 09, 2007 2:23 pm

You dont need to syncronize every single object that can be simultaneously accessed ...
But in my experience the vast majority of parameters dont need to be locked when they are being changed. At least in terms of audio stuff.

You just cannot concurrently read from and write to data in a multithreaded fashion without a proper lock. Now you are running under a fault tolerant environment in which reading incorrect data now and then may be acceptable, but this holds only if you exactly know what you are doing and can assure you are not causing any data corruption. I just want to play by the rules.

Huh? It would only do that if you had a static/global mutex, which wouldnt make much sense anyway.
The mutex should be tied to the synth / effect, instance data.

It does in the example I gave which was not very well thought out. I could rewrite the sample so it only locks while accessing the data and does not hold the lock while calculating the audio but still a read/write lock would be better.

CriticalSections on windows dont cause a mode switch. They use one of the atomic lock instructions.

That is true, but only as long as no contention takes place. I would not use a critical section because it suffers from the same problem as the mutex and its limited to one process only. I am not sure if there is some fancy host out there which uses a separate process to deal with the audio, or even know if this is at all possible.

Im not sure how that would help with a 2 thread audio/gui situation. Realy there's only one reader and writer anyway, so i dont see how multiple readers would help.

I am completely new to audio programming and the VST subsystem so I don't know what means a 2 thread audio/gui situation. But what I am observing is that every VST host spawns a bunch of realtime priority threads which get woken from their sleep state when necessary. And all these threads are running concurrently in the process path (is it called render ?). Assuming this path only reads and does not write to global data, a multi read exclusive write lock would really help here.

I wonder if I am missing out on any good documentation because the VST SDK doc is very limited.

nollock · Post by **nollock** » Tue Oct 09, 2007 3:41 pm

Resplendence wrote:You just cannot concurrently read from and write to data in a multithreaded fashion without a proper lock.

Of course you can. Single memory reads/writes are atomic. You use a lock when you are reading / writing multiple values which as a collection need to be atomic.

Now you are running under a fault tolerant environment in which reading incorrect data now and then may be acceptable, but this holds only if you exactly know what you are doing and can assure you are not causing any data corruption. I just want to play by the rules.

Nobody is reading / writing incorect data. If you have a single floating point value that determines the pitch of an oscillator what would be the point of locking it while reading and writing to it? It would add nothing as it is an atomic operation by default.

I am not sure if there is some fancy host out there which uses a separate process to deal with the audio, or even know if this is at all possible.

I am completely new to audio programming and the VST subsystem so I don't know what means a 2 thread audio/gui situation.

Generally the audio processing is done in a seperate thread than the GUI. With the audio thread being much higher priority.

In plugin world thats all you have, one thread for the GUI and one for the audio. Although the host may spawn more threads, your plugin will only ever see two of them.

So as a plugin writter you only need to syncronize between GUI and Audio.

But what I am observing is that every VST host spawns a bunch of realtime priority threads which get woken from their sleep state when necessary. And all these threads are running concurrently in the process path (is it called render ?). Assuming this path only reads and does not write to global data, a multi read exclusive write lock would really help here.

Maybe on the host side that would be useful. But from a plugins point of view only 2 threads exist, (unless it creates some extra threads itself.)

Resplendence · Post by **Resplendence** » Tue Oct 09, 2007 4:21 pm

Of course you can. Single memory reads/writes are atomic. You use a lock when you are reading / writing multiple values which as a collection need to be atomic.

Possibly in one instruction but not atomic in a multiprocessor enviornment. Nowadays we have multiple processors which really are executing at the same time, so this is what interlocked functions and lock instructions are for.

Nobody is reading / writing incorect data. If you have a single floating point value that determines the pitch of an oscillator what would be the point of locking it while reading and writing to it? It would add nothing as it is an atomic operation by default.

You may see no point because it is acceptable to have an exception thrown just only now and then which is anyway handled and ignored by the host application. But the rules of multithreaded programming in an multi processor environment are really applying here.

Generally the audio processing is done in a seperate thread than the GUI. With the audio thread being much higher priority.

In plugin world thats all you have, one thread for the GUI and one for the audio. Although the host may spawn more threads, your plugin will only ever see two of them. So as a plugin writter you only need to syncronize between GUI and Audio. Maybe on the host side that would be useful. But from a plugins point of view only 2 threads exist, (unless it creates some extra threads itself.)

Again, we are really running in a multithreaded multiprocessor environment. And our process functions are not called one at a time one after the other. They are processed simulatenously by multiple threads on mulitple processors. The SDK doc says: "Threading issues. In general, processEvents(), startProcess(), stopProcess(), process(), processReplacing() and processDoubleReplacing() are called from a time-critical high priority thread (except for offline processing)". They should change this to "multiple threads" because this is really what's happening. If you don't believe me startup process explorer and your favorite vst host application and see for yourself.

Resplendence · Post by **Resplendence** » Tue Oct 09, 2007 5:17 pm

Again, we are really running in a multithreaded multiprocessor environment. And our process functions are not called one at a time one after the other. They are processed simulatenously by multiple threads on mulitple processors. The SDK doc says: "Threading issues. In general, processEvents(), startProcess(), stopProcess(), process(), processReplacing() and processDoubleReplacing() are called from a time-critical high priority thread (except for offline processing)". They should change this to "multiple threads" because this is really what's happening. If you don't believe me startup process explorer and your favorite vst host application and see for yourself.

Ok, let me take these last words back. I cannot make conclusions from what I see with Process Explorer because there is no way to know if these threads were really running at the same time particularly because realtime priority threads are running. Possibly I am being too paranoid. Also it seems audio process callback is happening for all channels at once so that means the data is dependent on each other so they logically must be serialized. This would imply a mutex is ok to use and no read/write lock would be needed. What is not clear to me is why every VST host starts up 12 rt priority threads if there is only one running at a time.

Despite this, multithreaded programming practices must still be applied. If you update your oscillator from a GUI thread and read from it in your process function without a lock you do have a problem.

/Daniel

nollock · Post by **nollock** » Tue Oct 09, 2007 5:17 pm

Resplendence wrote:Possibly in one instruction but not atomic in a multiprocessor enviornment. Nowadays we have multiple processors which really are executing at the same time, so this is what interlocked functions and lock instructions are for.

Yes you use those if you want ensure exclusive access to a specific memory address. And from that you can build larger syncroniztion objects.

But that is not the issue.

If you write a 32 bit integer to memory, the whole of it is written or it is not. You will not get a corrupted value if you read that address from a thread on another CPU. You may get a previous value before it has passed from the cache into main memory.

And as I've explained in a large amount of cases that doesnt matter. The worst that happens is the code continues with an old value for a few samples. It stays at 12db or 410hz a few microseconds longer.

You may see no point because it is acceptable to have an exception thrown just only now and then which is anyway handled and ignored by the host application. But the rules of multithreaded programming in an multi processor environment are really applying here.

Because what Im talking about does not cause exceptions. As long as the memory writes are attomic in themselves.. and they do not need to be atomic in respect to other parameters.. then the worst that can happen is some code working with an old but still valid value for a while.

Again, we are really running in a multithreaded multiprocessor environment. And our process functions are not called one at a time one after the other. They are processed simulatenously by multiple threads on mulitple processors. The SDK doc says: "Threading issues. In general, processEvents(), startProcess(), stopProcess(), process(), processReplacing() and processDoubleReplacing() are called from a time-critical high priority thread (except for offline processing)". They should change this to "multiple threads" because this is really what's happening. If you don't believe me startup process explorer and your favorite vst host application and see for yourself.

Of course a *Host* has lots of threads... I said as much in my previous post. But that does not mean a plugin will be called concurently from all of those.

In fact 'Process' and such like can not be called concurently anyway because future procesing may and almost allways does depend on the results of previous processing. You cannot process sample[x] until you have the resuly of sample[x-1].

So any given plugin instance will only ever be entered by one audio thread at a time.

Of course multiple instances of the same plugin could be running concurently in different threads, in which case shared mutable data between instances needs to be syncronized.

Christian Budde · Post by **Christian Budde** » Tue Oct 09, 2007 6:10 pm

I haven't read all of what you wrote, but calling the next process() call doesn't make any sense. Think of filters. They depend on continous audio streaming. So calling the next process() while another one is still not finished will lead to ugly audio glitches in anyway.
I believe hosts are clever enough to avoid this, even on multithreaded CPUs.

Christian

Resplendence · Post by **Resplendence** » Tue Oct 09, 2007 6:27 pm

Christian Budde wrote:I haven't read all of what you wrote, but calling the next process() call doesn't make any sense. Think of filters. They depend on continous audio streaming. So calling the next process() while another one is still not finished will lead to ugly audio glitches in anyway. I believe hosts are clever enough to avoid this, even on multithreaded CPUs.

I stand corrected. On a side note I will look for a good way to add synchronization to your samples with as little overhead as possible and mail you the files later. I decided I want to build a very advanced synthesizer and possibly I am going to make use of your template.

/Daniel

Delphi ASIO & VST sourceforge project