fitting an impulse response - prony algorithm

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

At the same time, people keep asking for multicore support, so..

Luckily in FL I only had to process generators in their own thread, so it's sync. I still need to set/wait for events, but everything has to return for the processing block to end.


But I see no other solution. In fact, none of the supposedly realtime impulse reverb processors (like wizooverb), work in FL. Because FL processes in very short blocks, they all have a huge CPU usage. So I put FL's impulse reverb processor inside our audio editor, for offline processing only.

Post

Fruity is using short blocks, right, we had troubles :D ( beta-tester reported it immediatly, especially people with old computers), but we have solved using multi-threading, as you suggested before. Things are not perfectly tuned anyway :( , for example we have still troubles in Ableton.
But probably it is simply a bug. The high cpu usage could be translated in "drop-outs" for slow processors.

And you are right, multicore support was a pain, every processor is not synchronized with the other ones, so if you are computing RDSTC you have to do in one of them (always choosing the same one), which is time-costly, due to the synchronization step :roll:
But new duos are REALLY fast...

Post

so if you are computing RDSTC
why did you need to, btw?

Post

because we discovered queryperformancecounter was not reliable for some old chipset (like via), due to a project mistake. And we discovered a lot of developers (not dsp developers) are using it when they need precision.

http://support.microsoft.com/kb/274323

Post

I meant, why do you need the clock to sync your threads?

Post

the trouble was "calculating the exact amount of cpu load" with high precision for "sleeping" of the right amount.
As I said before, a lot of hosts calculate performances simply looking at the time used by the process call. So you should "hang" it of a constant "right" amount (you know the cpu-load and proportionally distribute for each process call) But "what" is the right amount? You could measure with "ticker" or system clock, but here precision is around "millisecond", which, as you know is not so high for a 128 bytes buffer @ 96Khz (around "milliseconds" too).
Imho the "multi-threading" method for a dsp plug or instrument is not simple at all

Post

yes but you're not hanging the processing/using CPU on purpose just for the CPU meter of the host to look more accurate(?)

Post

When we started, it was for the host meter.

But we discovered an other reason for doing it:
if you don't sleep, sometimes the process returns too quickly. I mean, if you take 30% for a time slice and 0% for the other time slices you have not only a moving meter (inside live for example), but a not-constant behaviour of its engine. In Live this works very very bad. I don't know exactly the reason.

but if you have any idea, just tell me... I'm here for learning too :D

Post

I'm not sure I understand it correctly, but if I do, you're in fact spending 2x the CPU usage, once in a thread for the FFT, and once in a sleeping thread? Even though the sleeping thread doesn't really eat the CPU (it sleeps), it's still CPU that the host would have been able to use, especially in a multicore system.

I never implemented anything like this, but I'd have thought of a simple system like:
You get input that you buffer, once a buffer is ready, you release a thread that will process it. You also check if the thread has finished processing the last buffer, and you use the results.

I would think that this would lag for 1 more FFT chunk, or in fact, for the 'max block size' (reported by the host) divided by the FFT chunk length.

I'm quite sure it indeed ends up much more complex, so I don't know..

Post

tony tony chopper wrote:I'm not sure I understand it correctly, but if I do, you're in fact spending 2x the CPU usage, once in a thread for the FFT, and once in a sleeping thread? Even though the sleeping thread doesn't really eat the CPU (it sleeps), it's still CPU that the host would have been able to use, especially in a multicore system.

I never implemented anything like this, but I'd have thought of a simple system like:
You get input that you buffer, once a buffer is ready, you release a thread that will process it. You also check if the thread has finished processing the last buffer, and you use the results.

I would think that this would lag for 1 more FFT chunk, or in fact, for the 'max block size' (reported by the host) divided by the FFT chunk length.

I'm quite sure it indeed ends up much more complex, so I don't know..
exactly as you have described...
About "spending 2x": a sleep doesn't use cpu.

the trouble (I try to explain better):
imagine you have a 8192 block for your fft (here we are speaking about fft, but it could be also a different process)
1 ) you receive 1024 and you do nothing (you bufferize)
2 ) you receive 1024 and you do nothing (you bufferize)
...
8 ) you receive 1024 and the thread is ready for processing.

If you receive 1024 and you return immediatly you have 2 side effects: 1) the host "thinks" you are fast, and this is not true (and this is wrong, because the thread is doing an heavy processing in the meanwhile) so the cpu meter is wrong
2) the host sometimes receives a fast answer, sometimes (the last time) it "could" receive a slow answer, because the "thread" process could not having finished the previous calculation. Some hosts doesn't accept it very well. As the cpu-meter moves in a randomic way, they pops and make other little mistakes. I say "some hosts", because for example cubase behaves very differently from live or tracktion. It depends on the way the host is calling the plug and is filling the asio buffers.

Post

a sleep doesn't use cpu
No but you still hold the host's mixer thread, thus you eat CPU if there's nothing else to use that CPU (other processes, or the GUI). In a single core, it will be used by your background FFT thread, but in a a dual core, it can be kinda wasted.

It may not technically eat CPU, but may cause underruns. I don't quite understand point 2), and I trust you on that, but I think the host's CPU usage mater not showing the exact value isn't a critical problem.

Post

tony tony chopper wrote:
a sleep doesn't use cpu
In a single core, it will be used by your background FFT thread, but in a a dual core, it can be kinda wasted.

.
We are developing on dual cores and we have not found this behaviour. Probably we should wait for hosts optimized for dual cores... In any case, old xeons were perfectly synchronized (we have a xeon too)

Post Reply

Return to “DSP and Plugin Development”