KVR Audio

Tzarls · Post by **Tzarls** » Sat Feb 25, 2012 7:54 am

I'm working on this graphical - modular programming app, the kind of app in which you have small boxes that apply some processing to the signal going thru them.... just like Reaktor, Synthmaker, Synthedit (here's a very simple demo, in case you're curious: http://www.youtube.com/watch?v=0aLopXd2 ... OyBVWFA5oW).

I'm working on the realtime part now. It's all working ok, except for one little problem:

In the class that manages the realtime processing tasks (executing the process function in every box in the project) there's this code (yes, right now I'm doing per-sample processing here):

Code: Select all

int blockManager::process(void* output, void* input, unsigned int frames, int outs, int ins)
{
	float* inBuffer = (float*)input;
	float* outBuffer = (float*)output;

	// Por ahora solo nos ocupamos del buffer de salida.

	for (unsigned int i = 0; i<frames && !skipProcess; i++)
	{
		for(unsigned int t=0; t<tree.size() && !skipProcess; t++)
			tree[t]->process();


		for(int o=0; o<outs && audioOutBlock; o++)
			*(outBuffer + (frames * o) ) = (float) (audioOutBlock->getInputValue(o) );

		outBuffer++;
	}

	return 0;
}

"tree" is a vector that holds pointers to every box in the project. This vector is modified every time a box is created, deleted, connected or disconnected to another box. Obviously, right now the vector can be modified (via the GUI thread, for example by deleting a box) in the middle of the process(...) function, leading to disaster.

I was thinking about using a Fast Mutex (using the ZThread lib) to protect access to the "tree" vector. My idea is to have 2 swappable vectors, say treeA and treeB, and a pointer, say pTree. Initially pTree would point to treeA, and when the GUI tries to update the tree, it would do all of its work using the treeB vector. Then , once the tree is built, a lock is acquired and the pTree pointer is set to point to treeB. Next time the GUI wants to rebuild the tree, it works on treeA, and at the end of the operation, the lock is acquired and pTree is set to point to treeA... you get the idea.

In the process(...) function (which would now be using pTree to access the current valid vector), the lock would be acquired just at the beginning of the function, so if the GUI wants to update the tree, it has to wait (worst-case-scenario) for the block to be processed before it can set the pointer. So the process(...) functon is always working using a valid vector.

If the process(...) function tries to acquire the lock while the other function (the one that updates the tree)has it, then the maximum it would have to wait would be the time it takes to set a pointer, so I guess the possibility of having dropouts would be very very low, right?

Would having a lock aquired per buffer mean much overhead?

I'd like to know your opinions and/or ideas on this subject.

Thanks.

aciddose · Post by **aciddose** » Sat Feb 25, 2012 8:04 am

build a secondary tree from your gui thread. lock, set a "new tree" pointer. read garbage pointer. unlock. this operation should take very little time and the lock can be process-local - you can use a critical section, you don't need a slower mutex. (actually, you can just declare the pointer volatile and not use a lock..)

delete the garbage object if it's valid.

in the audio thread before processing, check to see if the "new tree" pointer is valid. if so, load the new pointer and move the old one into "garbage".

this isn't the most intelligent way to handle the problem. you can use flags instead and do the delete before you build the new tree. actually it sometimes is possible to re-use the spare for your new tree instead especially if it's filled with vector-like pre-allocated blocks rather than just links.

try thinking this way though instead of worrying about how expensive a mutex is - you don't even need one.

you need a mutex when: a shared object is accessed BI-DIRECTIONALLY by MULTIPLE processes
you need a critical section when: a shared object is accessed BI-DIRECTIONALLY by MULTIPLE threads
you need a volatile flag when: a shared object is accessed UNI-DIRECTIONALLY by MULTIPLE threads

kuniklo · Post by **kuniklo** » Sat Feb 25, 2012 11:31 am

It seems like the drawback of this kind of approach is that you have to maintain two copies of all your shared datastructures, right? Or am I missing something?

Xenakios · Post by **Xenakios** » Sat Feb 25, 2012 5:15 pm

aciddose wrote:
try thinking this way though instead of worrying about how expensive a mutex is - you don't even need one.

Some toolkits/APIs call a syncro object "mutex" and it doesn't mean a heavy cross process syncro object, like it does in the Windows API. For example in Qt, the cross process syncro object is called QSystemSemaphore. QMutex is the Windows Critical Section equivalent.

thevinn · Post by **thevinn** » Sat Feb 25, 2012 6:21 pm

kuniklo wrote:It seems like the drawback of this kind of approach is that you have to maintain two copies of all your shared datastructures, right? Or am I missing something?

Rather than a drawback, I see this as a feature. It is how I implement most all of my concurrent systems.

Xenakios · Post by **Xenakios** » Sat Feb 25, 2012 6:29 pm

thevinn wrote:
kuniklo wrote:It seems like the drawback of this kind of approach is that you have to maintain two copies of all your shared datastructures, right? Or am I missing something?
Rather than a drawback, I see this as a feature. It is how I implement most all of my concurrent systems.

I wonder if the original poster is worrying about premature optimization...

Rather than worry about keeping duplicates of data, it might be useful to measure what's the actual cost and have that in scale to whatever else is happening in the program/system. Like, if the data structure in question costs 1 megabyte without the duplicate, and 2 megabytes then with the duplicate, but then there's something else that's consuming 20 megabytes anyway. It might pay off more to optimize the thing with the 20mb memory consumption...

thevinn · Post by **thevinn** » Sat Feb 25, 2012 6:35 pm

Xenakios wrote:I wonder if the original poster is worrying about premature optimization... Rather than worry about keeping duplicates of data, it might be useful to measure what's the actual cost

I find that giving each thread its own complete read-only copy of shared data is simply easier to implement, performance considerations aside. It sidesteps all messy issues of locking and data races.

To update the shared data, a message passing queue is used. A thread's copy of the data is only changed at a single, well defined point of execution. This allows strong statements to be made about the correctness of a concurrent system.

It is often easier to construct and verify proper concurrent behavior using this technique.

Tzarls · Post by **Tzarls** » Sat Feb 25, 2012 9:21 pm

Right now I'm not worried about duplicate data if it serves a purpose. BTW, I've already done a first pass of optimization, basically because the CPU usage was extremely high. I learned the hard way that it's not a good idea to allocate resources (in other words, use the "new" operator) in the audio thread (newbies take note!). With that out of the way, I need to sort this "tree" thing....

The problem here is that I don't think duplicates won't solve the problem. In the end there must be only 1 valid - active tree vector. And that vector must become valid before the function that calculates the vector returns. Why? Imagine that I delete an object. In its destructor, the box un-registers itself from the object manager. The manager, in turn, re-calculates the tree based on this event. The tree is recalculated and the function returns to the deconstructor of the object. The object dissappears. And if the process thread doesn't get the updated tree before the box is destroyed, it could end up trying to call the now defunct object - crash!

That's why I thought of blocking the function that calculates the tree until we are sure that the processing function has taken note of the new updated tree.

I just thought of some other way: The calculateTree function calculates the tree and sets a flag to tell the audio thread to take note of the new vector. The it waits until the flag is reset and only then it returns. The audio thread checks if the flag is set at the end of every block, takes note of the new pointer and resets the flag. The audio thread never gets blocked. The GUI thread can get blocked for - worst case scenario - the time it takes to complete a block of samples.

What do you think?

thevinn · Post by **thevinn** » Sat Feb 25, 2012 9:26 pm

Tzarls wrote:The tree is recalculated and the function returns to the deconstructor of the object. The object dissappears. And if the process thread doesn't get the updated tree before the box is destroyed, it could end up trying to call the now defunct object - crash!

Typically when using duplicate structures to implement safe concurrency, objects in the system are reference counted - no explicit calls to 'delete'. The object is only destroyed when the last reference goes away.

Objects can gain references in several ways. They could exist in one or more data structures (directed graphs in the original post). A reference might persist as a local variable in a function's scope. But most importantly, in the process of posting a packaged asynchronous function call to another thread, a functor may hold a reference to the object.

One useful technique is to design the reference counted object class so that rather than deleting the object immediately when its last reference goes away, it is instead shipped off via thread queue to another utility thread whose purpose is solely to delete objects. This off-loads the actual deletion of rich data structures (like directed graphs) to a thread outside the critical path.

Tzarls · Post by **Tzarls** » Sat Feb 25, 2012 9:38 pm

Nice... Reference counting could do it, yes... unless someone suggests something that doesn't imply loads of code shuffling!

thevinn · Post by **thevinn** » Sat Feb 25, 2012 9:42 pm

Tzarls wrote:Nice... Reference counting could do it, yes... unless someone suggests something that doesn't imply loads of code shuffling!

Keep in mind you are trying to solve a very tough problem. Concurrent systems are notoriously difficult to implement and maintain.

Tzarls · Post by **Tzarls** » Sat Feb 25, 2012 10:03 pm

Well, so far it's been enough fun to keep going...

aciddose · Post by **aciddose** » Sat Feb 25, 2012 10:05 pm

you never need to actually use duplicate data.

where does anyone get that idea?

all you need is the old data which is about to be thrown out, and the new data which is about to be used.

so yes you do temporarily need twice as much space. the alternative is that you'd block the audio thread or output silence.

doesn't make sense when we're talking about a 20kb list.

Tzarls · Post by **Tzarls** » Sat Feb 25, 2012 10:22 pm

aciddose wrote:you never need to actually use duplicate data.

where does anyone get that idea?

all you need is the old data which is about to be thrown out, and the new data which is about to be used.

Exactly! The problem lies in how (and when) to exchange old data with new data.

DavenH · Post by **DavenH** » Sat Feb 25, 2012 11:44 pm

you might want to look at Microsoft's parallel containers:

from http://msdn.microsoft.com/en-us/library/dd504906.aspx:

"The concurrent_vector class enables concurrency-safe append and element access operations. Append operations do not invalidate existing pointers or iterators. Iterator access and traversal operations are also concurrency-safe."

Reading - modifying vector from different threads.