Question about not uniform partitioned convolution algorithms

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

Hello everybody !

In a previous topic, I asked a few tips about uniform partitioned convolution (UPC) algorithms, and I have been able to code one which has been included in the JUCE SDK:

viewtopic.php?f=33&t=440314

Currently I'm working on a not uniform (NUPC) variant, and I have some questions I would like to ask to the KVR DSP community. I have been able to design a "naive" single-thread one, which uses two instances of my UPC algorithm:

* The first one is processing only the first "headSize" samples (the head of the IR) with no latency (ZLUPC). It takes a maximum audio buffer size as a parameter, which is being used to set the FFT size, and is the exact size of the input accumulation buffer size. To be without latency, it needs to return output samples at each process function call. That means it will do the convolution processing all the time even if the input buffer is not full yet, and of course at the exact moment it is full as well, before resetting its content.

* The second one is processing the remaining samples (the tail of the IR) with "headSize" samples of latency (UPCAL). In this case, the maximum audio buffer size is also the value of the latency: the convolution processing will happen only the input accumulation buffer is full, and the result will be sent to the output only after that. So I set that audio buffer size to "headSize".

By summing their outputs, since the second convolver has "headSize" samples of latency, I can reconstruct perfectly the signal I would have with only one UPC algorithm. However this approach is wrong in every aspect, because once in a while it forces the main processing function to do the UPCAL side in one single callback, causing potential dropouts if the maximum audio buffer size is low.

Now I want to put the tail processing in a background thread, so I don't have to do it in a single potentially short function, but I can take my time to do it. However I can't just copy and paste my previous approach, since I still don't want to do the processing at the very last moment I need the result, even in a separate thread. Instead, I want to start that processing earlier.

What is the simplest but still effective approach to do so ? For example, I was thinking I could just use my ZLUPC engine twice every "headSize" / 2 samples for the tail part. For example, let's say "headSize" = 1024 samples. At n = 0 samples, I would do nothing on the tail part. At n = 512 samples, I would start processing the first 512 samples with it, and store the result in a temporary buffer, during the time between n = 512 and n = 1024. Then at n = 1024 samples, I would start returning the first 512 output samples I just got, and I would use the upcoming time between n = 1024 and n = 1024 + 512 samples to process the first 1024 input samples together etc.

What do you think of this ? Thanks in advance !
Last edited by Ivan_C on Mon May 25, 2020 1:48 pm, edited 2 times in total.

Post

Basically, let the direct (ie. zero latency) part process twice the blocksize you need for the next partition, so it can then send one block for processing and while mixing the previously processed block with the directly processed result. If you want more partitions, do the same with the next one and so on (although the scheduling quickly becomes somewhat tricky).

Post

I'm sorry mystran, this topic and related articles are already confusing me a lot, I don't get what you mean in your message :) Is it similar to my approach where I process the same input block twice, once half full, and the second time full (with the first half being the same than previously), so I can already output some samples during parallel processing ?

I did also a "time distributed" variant, using only the audio thread, where I use the same pattern than for the multi thread one, but where I need to find a rule to distribute the processing over several processSamples function callbacks (the 2 FFTs, and all the multiplications). It's probably going to work very poorly in FL Studio :lol:

Post

Ivan_C wrote: Mon May 25, 2020 1:52 pm I'm sorry mystran, this topic and related articles are already confusing me a lot, I don't get what you mean in your message :) Is it similar to my approach where I process the same input block twice, once half full, and the second time full (with the first half being the same than previously), so I can already output some samples during parallel processing ?
Let's say you have 2 partitions and the bigger one uses a blocksize of 512 samples. Now you want the shorter one to process the first 1024 samples of the IR, so you have 512 samples of latency to collect one block and then another 512 samples of latency for the background thread to process it.

There's no need to ever process "half blocks" at any point.

Post

OK I got it now thanks :wink: Actually it's one of the other strategies I was thinking about, but I wonder which one between the one I described earlier and this one is going to me more efficient...

Post Reply

Return to “DSP and Plugin Development”