KVR Audio

hibrasil · Post by **hibrasil** » Tue Jan 12, 2010 10:43 am

I have been trying to decide between using iPlug or the VST2.4 sdk to develop some new plugins. I noticed iPlug has mutex locks all over the place and i am wondering how important this is in a multithreaded host. When does it become an issue? Is there an easy cross-platform way to implement this functionality if i don't use iPlug?

thanks,

oli

nollock · Post by **nollock** » Thu Jan 14, 2010 8:11 am

hibrasil wrote:I have been trying to decide between using iPlug or the VST2.4 sdk to develop some new plugins. I noticed iPlug has mutex locks all over the place and i am wondering how important this is in a multithreaded host. When does it become an issue? Is there an easy cross-platform way to implement this functionality if i don't use iPlug?

Your plugins need to be thread safe.

You can go the uber safe way of locking any use of data that is used by both the audio and gui code.

Or can learn what needs to be locked and what doesnt. For example a simple gain plugin where the only data used by both is a float variable that holds the gain shouldnt need to be locked. But if it was a vector/array then it likely would need to be. The key here is whether the data can be updated atomically (in one step without any other thread or cpu seeing a half written state).

Or you can go the modern funky way and use lock free queues (they really should be called non-blocking queues) to send messages around. So for the above example the GUI would post a message to the audio thread that tells it to update the gain variable.

A cross platform solution would be pthreads.

Probably a simpler solution would just to be to write your own wrapper around the mutex primitives provided by each OS.

Christian Schüler · Post by **Christian Schüler** » Tue Jan 19, 2010 2:54 am

An easy way for making things multithreading safe is to ensure each thread only operates on data that it owns. If there is no shared data, then there is no problem. Then you need some kind of inter-thread communication, like, a command ring-buffer (aka a Pipe). This of course needs to be locked, or at least non-blocking thread safe, which can be done.

tony tony chopper · Post by **tony tony chopper** » Tue Jan 19, 2010 5:18 am

I would avoid trying to do lock-free stuff unless you're really an experienced programmer, or you found a widely known & tested library designed for this. The problem with thread safety is that it has to work in your head, not just in practice. Lots of programmers try & test, this is not something that can be tested, something that's not thread safe isn't very likely to crash immediately.

When in doubt, better lock. Better affect performances than to be unsafe.
Also, try to avoid passing double-precision data, that won't be atomic (=will be done in 2 steps) in a 32bit app (but see Interlocked_XXX functions in Windows).
When securing too much you also have risks of deadlocks, so it's not easy. IMHO multithreaded programming should be the #1 interest of starting programmers who want to do VST's.

nollock · Post by **nollock** » Tue Jan 19, 2010 7:19 am

tony tony chopper wrote: Also, try to avoid passing double-precision data, that won't be atomic (=will be done in 2 steps) in a 32bit app (but see Interlocked_XXX functions in Windows).

That was true on the 486 but since Pentium all writes/reads up to 64 bits are guaranteed to be atomic if they are naturally aligned. Words to 2 bytes, DWORDs to 4, QWORDs to 8.

With the newer x64 processors, all reads / writes are atomic as long as they dont cross a cache line boundary.

It's in the Intel tech docs somewhere.

tony tony chopper · Post by **tony tony chopper** » Tue Jan 19, 2010 8:42 am

That was true on the 486 but since Pentium all writes/reads up to 64 bits are guaranteed to be atomic if they are naturally aligned. Words to 2 bytes, DWORDs to 4, QWORDs to 8.

How could this be true? Depending on the compiler, a QWORD read/write will involve 2 32bit accesses, how could this be atomic? There are operators to work on 80bit (FPU) or 128bit (SSE), and frankly I wouldn't know if those can atomically read/write, but that's not the problem as the compiler is more likely to do QWORD integer operations using 2 32bit operations, maybe using SSE2 only if you tell it to.

Besides, memory alignment is too very compiler-dependent, so it's safer to assume that swapping 64bit stuff would better be done using Interlocked functions.

Ninjan · Post by **Ninjan** » Tue Jan 19, 2010 1:08 pm

Usually processors have a single atomic instruction.
What most of them do is to swap between a memoryaddress content with a register.

ARM has a instruction called swp ... swp rd,rm,[r0]
Intels instruction is called xchg ... xchg dest,src

Both change the contents of the two address (or 1 address and a register).
In one instruction.

Intel has more of those ATOMIC instructions.
CMPXCHG is the another one .. cmpxchg dest,src. compare (some bitfields) and move.

I have around 10 years of experience in multi thread/processor coding.

It is not that hard.
You just NEED to know how everything should work in the end.

So, there is no reason to make a multithread app , just for doing it.
Sure , to test and play. That is the way to learn it.

When your application is done, only then you'll have the full picture on how you should implement your multiprocessing.

If you need some ideas, i'll gladly help you out with some hints for your threading.

Still, 25 years of assembler coding and you learn some tricks.

mystran · Post by **mystran** » Tue Jan 19, 2010 1:24 pm

tony tony chopper wrote:
That was true on the 486 but since Pentium all writes/reads up to 64 bits are guaranteed to be atomic if they are naturally aligned. Words to 2 bytes, DWORDs to 4, QWORDs to 8.
How could this be true? Depending on the compiler, a QWORD read/write will involve 2 32bit accesses, how could this be atomic?

Indeed, the situation is that a single load/store involving aligned access (or cache line, but it's easier to just align), will be atomic. If you do a 16-byte load/store with SSE on aligned address, it's just as atomic (ok, this is from memory, I didn't double check it, as I don't immediately recall needing more than 32-bits moved from thread to thread on the fly). If the compiler splits a 16-byte load/store into 4 32-bit loads/stores, it's now four loads/stores and not atomic in any way. I think nollock is thinking on opcode level.

What this means: yes, you can move double (and 16-byte) data around with atomic loads and stores, but the part about avoiding it still holds, unless one knows for sure how a particular compiler compiles different things. Aligned 32-bit load/store at least will always be atomic (for the current generation of compilers and processors; if they ever remove 32-bit access from the ISA, then a 32-bit store will probably become load+modify+store on 64-bit data which is no longer atomic; I find that rather unlikely though).

Your other point is very true though: even for a plugin programmer who's comfortable with manual memory management and that sort of stuff (which is typically not easy either; definitely much harder than learning the C++ syntax and how to solve problems), the multi-threading stuff is almost certainly the hardest part. It's a lot harder than any DSP specific stuff. And on top of that, you would theoretically want to be able to guarantee the real-time threads do not block, which makes everything another order of magnitude harder.

One essentially has to prove, for every situation where threads interact, that no matter which order individual operations (on assembler level for lock-less programming) are done, the final result is always well-defined (that is, it either works, or has a well-defined failure mode which can be handled gracefully).

As for practical advice to newbies: like everyone I suggest minimizing shared data as much as possible (or completely if possible), but beyond that, my favorite method is communicating with some form of message passing. This doesn't work in every situation (sometimes critical sections are necessary, such as when you actually need to synchronize two threads; in this case the messaging overhead is just waste of time) but if 99% of code doesn't share anything, and simply sends and receives messages, it's a lot easier to track the 1% of code that has to be proven to be correct. Message-passing also avoids most problems like priority inversions and the like as long as the messaging itself is safe.

mystran · Post by **mystran** » Tue Jan 19, 2010 1:29 pm

Ninjan wrote: When your application is done, only then you'll have the full picture on how you should implement your multiprocessing.

Welcome to the world of VST plugins: every plugin needs to be able to deal with at least two threads: GUI thread running at interactive priority, and audio-processing thread running at real-time priority.

In other words, your advice doesn't apply to VST plugins: every single one of them has to be designed for multi-threaded environment, because the GUI thread and audio-thread can (and will) run concurrently.

We're not even talking about general multi-processing. We are in environment with interaction of interactive and real-time threads, which mean we are doing two hard things combined: real-time programming and multi-threading. Whether a plugin wants to do multi-processing isn't even relevant. You'll be in multi-threaded environment, and have to be thread-safe whether you like it or not.

tony tony chopper · Post by **tony tony chopper** » Tue Jan 19, 2010 2:16 pm

Welcome to the world of VST plugins: every plugin needs to be able to deal with at least two threads: GUI thread running at interactive priority, and audio-processing thread running at real-time priority.

It's even safe to say (these days): 1 GUI thread and several processing threads, & the thread for 1 processing chunk may not be the same as the one that will process the next chunk of the same plugin.

hibrasil · Post by **hibrasil** » Tue Jan 19, 2010 2:26 pm

thanks for all the info. So it sounds like the fact that iPlug already is already cross platorm-thread-safe might be a big deal, if it works.

does anyone have a simple example or tips on how to do cross platform thread safe VST using VST2.4/VSTGUI?

oli

mystran · Post by **mystran** » Tue Jan 19, 2010 4:20 pm

tony tony chopper wrote:
Welcome to the world of VST plugins: every plugin needs to be able to deal with at least two threads: GUI thread running at interactive priority, and audio-processing thread running at real-time priority.
It's even safe to say (these days): 1 GUI thread and several processing threads, & the thread for 1 processing chunk may not be the same as the one that will process the next chunk of the same plugin.

nollock · Post by **nollock** » Tue Jan 19, 2010 9:59 pm

tony tony chopper wrote: How could this be true? Depending on the compiler, a QWORD read/write will involve 2 32bit accesses, how could this be atomic? There are operators to work on 80bit (FPU) or 128bit (SSE), and frankly I wouldn't know if those can atomically read/write, but that's not the problem as the compiler is more likely to do QWORD integer operations using 2 32bit operations, maybe using SSE2 only if you tell it to.

Besides, memory alignment is too very compiler-dependent, so it's safer to assume that swapping 64bit stuff would better be done using Interlocked functions.

64 bit reads/write can be done with MMX, FPU or SSE. And they are all atomic if naturally aligned.

But you're right, if you dont know what your compiler is doing then you shouldn't rely on it.

tony tony chopper · Post by **tony tony chopper** » Tue Jan 19, 2010 11:02 pm

It's worse than I thought, I was assuming that at least INC & DEC were atomic, apparently they aren't.

But you're right, if you dont know what your compiler is doing then you shouldn't rely on it.

I would in no way rely on a compiler for this, things can change even between versions of the same compiler. So whether you know what your compiler is doing or not.. better write it in asm, or use the Interlocked functions.

mystran · Post by **mystran** » Tue Jan 19, 2010 11:58 pm

tony tony chopper wrote:It's worse than I thought, I was assuming that at least INC & DEC were atomic, apparently they aren't.

INC & DEC are not atomic by default. You can make them atomic by prefixing them with LOCK. IIRC the only instruction which doesn't require an explicit LOCK is XCHG. For anyone interested: The details can be currently found in IA-32 manuals Vol.3A, section 8.

But you're right, if you dont know what your compiler is doing then you shouldn't rely on it.
I would in no way rely on a compiler for this, things can change even between versions of the same compiler. So whether you know what your compiler is doing or not.. better write it in asm, or use the Interlocked functions.

Well, things could even vary depending on build settings, so yeah, I have to agree (unless you wrote the compiler, in which case you know what it does). The one thing you can generally rely though, is that loads/stores of a properly aligned variable (of 32-bits) tagged with "volatile" are predictable in the sense that loads/stores are always done exactly once from/to memory (ie, not cached in registers by optimizer), and in order with respect to each other [edit: including loads/stores to other "volatile" variables] (though not necessarily other [non-volatile] loads/stores; you need a compiler memory fence for that) even when the optimizer would normally avoid the loads/stores.

So if one needs a load/store (without a full load-modify-store) on a variable otherwise accessed with interlocked instructions, it's generally safe to do a normal C assignment, as long as the variable is tagged "volatile" and there is no issue with compiler memory fences.

edit: regarding LOCK prefix, it works with pretty much all standard integer operations; that's how you implement the Interlocked* functions on x86, though the MSVC versions also give you various memory fences (which is why there are three versions for those).

thread safe vst