Real-Time Convolution with FFT within a Plug-In

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Hey folks!

I try to program a little audio unit plug-in that convolves the audio track with an impulse response given by the plug-in. i know that i need an fft algorithm that transfers the audio file's real signals to their spectra,multiplies the two spectra of the audio files and then returns them to their real signals. studying the "Audio Programming Book" i found an fft-based convolution algorithm that covers this operation.

unfortunately this algorithm does not cover REAL-TIME convolution, which is of course absolutely necessary for a plug-in (is it?). nevertheless the book says:

"This function will not serve for real-time purposes. However, it can be modified for this kind of application, by processing one FFT block at a time and leaving the overlapp-add to be performed by the invoking code."

(i'll post the code at the end of my post!)

alright..concerning this i have a couple of questions that i just can't figure out myself, since i'm pretty new to this programming-audio-units-thing.


1. what exactly do i need to change within the code in order to make this function work within a real-time plug-in? maybe somebody can rewrite it or add the stuff needed if it isn't much work, because now i'm not making any progress. if not, maybe someone can show me the place in the code that needs to be changed and tell me how it needs to be changed.

2. how and at which location within the audio unit code do i load the impulse response? i couldn't figure that out, because i didn't find any examples that showed how to load a sound and where to load it within the code.

3. is there someone that knows audio units well or at least has some basic knowledge of them that can answer me further questions concerning that topic? that would help me A LOT! i read all i could find about them but some little things still aren't clear to me and i don't want to post a list of 20 lil questions in here. (maybe via email or personal message? would be just AWESOME)

since this is concerning my bachelor thesis, any help would be marvellous :)


tl;dr

anyone familiar with real-time convolution and/or audio units? holla at me :)

thank you guys very much in advance and greetings from germany!

patrick

Code: Select all

void convol(float* impulse, float* input, float* output, 
       int impulse_size, int input_size){
	
float *impspec, *inspec, *outspec; // spectral vectors
float *insig, *outsig, *overlap;  // time-domain vectors		  
int fftsize=1, convsize; // transform and convolution sizes
int overlap_size;  // overlap size		  	
int count, i, j;   // counter and loop variables
	  
overlap_size= impulse_size - 1;
convsize = impulse_size + overlap_size;

while(fftsize < convsize) fftsize *= 2;
      
impspec = new float[fftsize]; // allocate memory for
inspec = new float[fftsize];   // spectral vectors
outspec = new float[fftsize];

insig = new float[fftsize];
outsig = new float[fftsize];
overlap = new float[overlap_size];

// get the impulse into the FFT input vector
// pad with zeros
for(i = 0; i < fftsize; i++){
      if(i < impulse_size) insig[i] = impulse[i];
      else insig[i] = 0.f;
  }


// Take the DFT of impulse
fft(insig, impspec, fftsize); 

// processing loop
for(i = count = 0; i < input_size+convsize; i++, count++){

   // if an input block is ready
    if(count == impulse_size && i < (input_size+impulse_size)){

	// copy overlapping block 
      for(j = 0; j < overlap_size ; j++)
              overlap[j] = outsig[j+impulse_size];

      // pad input signal with zeros 
      for(j = impulse_size; j < fftsize; j++) 
	           insig[j] = 0.f;

    
	// Take the DFT of input signal block   
      fft(insig, inspec, fftsize);

	// complex multiplication
      // first pair is re[0Hz] and re[Nyquist]      
      outspec[0] = inspec[0]*impspec[0];
      outspec[1] = inspec[1]*impspec[1];

      // (a+ib)*(c+id) = (ac - bd) + (ad + bc)i
      for(j = 2; j < fftsize; j+=2){
       outspec[j] = inspec[j]*impspec[j]
                -   inspec[j+1]*impspec[j+1];
       outspec[j+1] = inspec[j]*impspec[j+1]
			  +  inspec[j+1]*impspec[j];
      }

      // IDFT of the spectral product
      ifft(outspec, outsig, fftsize);

      // zero the sample counter
	count = 0;

    }

   // get the input signal
   // stop when the input is finished
 if(i < input_size)
   insig[count]  = input[i];
   else insig[count] = 0.f;

   // overlap-add output starts only
   // after the first convolution operation
  if(i >= impulse_size)
      output[i-impulse_size] = outsig[count] +
       (count < overlap_size ? overlap[count] : 0.f);
}

// de-allocate memory
delete[] overlap;
delete[] outsig;
delete[] insig;
delete[] outspec;
delete[] inspec;
delete[] impspec;
    
}
[/code]

Post

what you want is partitioned convolution. here are a few pointers:

http://www.cs.ust.hk/mjg_lib/bibs/DPSu/ ... s/Ga95.PDF
http://www.acoustics.net/objects/pdf/ar ... rina04.pdf
http://www.eecs.berkeley.edu/~ericb/sch ... Fx2011.pdf

the algorithm is conceptually simple but a nightmare bookkeeping-wise, so please don't ask me to debug any code.
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

maybe this could be helpful, although not zero latency as it seems:

http://www.ludd.luth.se/~torger/brutefir.html
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

thanks robin for your links and your pm.

the point is, that my programming skills are not very advanced, i can read and understand all this, but changing an algorithm/dsp-code or adding stuff to it would be very tricky for me.

naw...i don't want delays as long as the impulse response, although i can't imagine how that would sound :D

in order to understand that right, if i use my posted function, it would work, but with delays as long as the impulse response?


nevertheless is there any known partitioned convolution code around? the imagination of writing one myself from scratch kinda scares me :shock:

Post

bountyhunter2010 wrote: unfortunately this algorithm does not cover REAL-TIME convolution, which is of course absolutely necessary for a plug-in (is it?).
FFT needs a bit of latency IE you delay the signal say 256 samples. If you report that to the host most hosts will compensate for that and everything's hunky dory. Well almost,it delays everything 256 but your plugin hence they will all be in sync. It'll result in 256 samples of additional latency but that ain't too bad.

Or you could just do straight convolution but that is really inefficient (heavy on the CPU). Has some upsides though,no latency and the algorithm is quite easy to understand conceptually.

http://www.dspguide.com/ch6.htm

Even comes with some pseudo or possibly BASIC code.

Post

bountyhunter2010 wrote:nevertheless is there any known partitioned convolution code around? the imagination of writing one myself from scratch kinda scares me :shock:
i thought, the BruteFIR library link above is just that. not?

however, question to all: are there any others?
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

WDL contains a realtime convolution engine, low (zero?) latency version included.

Post

jupiter8 wrote:Or you could just do straight convolution but that is really inefficient (heavy on the CPU). Has some upsides though,no latency and the algorithm is quite easy to understand conceptually.
not 100% true. whether or not it has latency depends upon what you're convolving. for minimum phase or most types of reverb or other recorded impulses it will be zero latency, but for anything with negative time components you'll have that much latency added even with a naive convolution.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.

Post

aciddose wrote:
jupiter8 wrote:Or you could just do straight convolution but that is really inefficient (heavy on the CPU). Has some upsides though,no latency and the algorithm is quite easy to understand conceptually.
not 100% true. whether or not it has latency depends upon what you're convolving. for minimum phase or most types of reverb or other recorded impulses it will be zero latency, but for anything with negative time components you'll have that much latency added even with a naive convolution.
Yeah i know. With a sinc impulse you get latency that is half the length of the impulse,right? That annoys the crap out of me since odd numbered lengths is so much easier to build (they're simply more "right" to me) than even ones but results in X + half a sample latency.

Post

That's how I had done it some time ago.
It may not be the most optimized overlap-add algorithm around, but basically that's what you should be doing:

Code: Select all

////// THESE ARE THE MEMBER VARIABLES
		/** input buffer */
		float* m_TempIn; 

		/** output buffer */
		float* m_TempOut;

		/** tail buffer */
		std::vector<float*> m_TempTail;

		/** Previous index*/
		long m_prevIndex;

		/** Kernel lengths for each channel */
		std::vector<int> m_KernelLengths;

//////FIRST ALLOC SOME MEMORY (INIT PART)
	const int cc = getOutputChannelCount();
	const unsigned long N = getOutputFramesPerChannel();

	//When an N sample signal is convolved with an M sample filter kernel, 
	//the output signal is N + M - 1 samples long
	m_TempTail.resize(cc); // R
	for (int chn = 0; chn < cc; chn++)
	{
		m_TempTail[chn] = NULL;
	}
	int maxb = 1;
	for (int chn = 0; chn < cc; chn++)
	{
		m_KernelLengths[chn] = getKernelLength(chn);
		const int M = m_KernelLengths[chn]; // kernel length for the channel
		const int B = N + M - 1;			// output buffer length for the channel
		maxb = std::max(maxb,B);
	}
	m_TempIn = Malloc_32f(N);
	if (!m_TempIn)
		return false;
	for (unsigned long i = 0; i < N; i++)
	{
		m_TempIn[i] = 0.0f;
	}

	m_TempOut = Malloc_32f(maxb+1);
	if (!m_TempOut)
		return false;
	for (int i = 0; i <= maxb; i++)
	{
		m_TempOut[i] = 0.0f;
	}

	for (int chn = 0; chn < cc; chn++)
	{
		const int M = m_KernelLengths[chn]; // kernel length for the channel
		const int R = M-1;				  // remainder buffer length for the channel

		m_TempTail[chn] = Malloc_32f(R);
		if (!m_TempTail[chn])
			return false;
		// Initially m_TempTail needs to be clean:
		for (int j = 0; j < R; j++)
		{
			m_TempTail[chn][j] = 0.0f; 
		}
	}
	m_prevIndex=0;
	return true;

//////////////////////////////////////////////////////////////////////////////////////////////////////
// PROCESSING PART
bool RunAt(unsigned long index)
{
	//When an N sample signal is convolved with an M sample filter kernel, 
	//the output signal is N + M - 1 samples long
	AudioBufferT<float> * pinput  = InConnector().GetBuffer(index);
	AudioBufferT<float> * poutput = OutConnector().GetBuffer(index);

	assert(pinput != poutput);
	assert ( (1+m_prevIndex) == (long)index);

	const int cc = getOutputChannelCount();
	const unsigned long N = getOutputFramesPerChannel();

	if (pinput->GetChannelCount() != cc)
		return false;

	if (pinput->GetFramesPerChannel() != N)
		return false;

	for (int chn = 0; chn < cc; chn++)
	{
		const int M = m_KernelLengths[chn];
		const int R = M - 1;	 // remainder
		const int B = N + R;	 // output buffer

		// get input (copy N frames to m_TempIn from channel chn)
		pinput->ChannelCopyTo(chn, m_TempIn);
		for (unsigned long j = 0; j< N;j++)
		{
			if (IS_ALMOST_DENORMAL(m_TempIn[j]))
				m_TempIn[j]=0.0f;
		}

		// convolve (your convolution routine goes here)
		if (!OnConvolve(m_TempIn, N, chn, M, m_TempOut))
			return false;

		for (int j = 0; j< B;j++)
		{
			if (IS_ALMOST_DENORMAL(m_TempOut[j]))
				m_TempOut[j]=0.0f;
		}

		// set output by adding the tail to the current buffer
		poutput->GetChannelDataFromMixOfTheseTwoVectors(chn,
			m_TempOut,
			m_TempTail[chn],
			R);

		// set new convolution remainder to m_TempTail:
		if (R > (int)N)
		{
			// long tail case
			for (int i = 0; i < R; i++)
			{
				int j = N+i;
				m_TempTail[chn][i] = j < R ? m_TempTail[chn][j] : 0.0f;
			}

			for (int i = 0; i < R; i++)
			{
				int j = N+i;
				m_TempTail[chn][i] += j<B ? m_TempOut[j]:0.0f;
			}
		}
		else
		{
			memcpy(m_TempTail[chn],
				&(m_TempOut[N]),
				sizeof(float)*R);
		}
	}

	m_prevIndex = (long)index;

	return true;
}
Last edited by stratum on Thu May 31, 2012 4:10 pm, edited 4 times in total.
~stratum~

Post

Please code in

Code: Select all

 blocks.. preserves intendation and all that stuff.

Post

mystran wrote:Please code in

Code: Select all

 blocks.. preserves intendation and all that stuff.[/quote]

Fixed...

The code is suboptimal because I have made the mistake of using interleaved data format for multichannel data (because that's what portaudio uses).
~stratum~

Post

thank you guys very much!

@ stratum: do i need some kind of library to use this code? some fourier library or somethin like that? and what kind of plug-in did you program with this code?

Post

@ stratum: do i need some kind of library to use this code? some fourier library or somethin like that? and what kind of plug-in did you program with this code?
I've extracted it from an audio processing library I've written.
The code itself is not open source or free software, and I don't intend to release it as open source as yet but overlap-add is actually a simple thing so I don't mind about posting it.

This means, yes it depends on external things, but the code should look simple enough so that you can easily guess what these do.

Note: OnConvolve calls an external convolution routine from intel integrated performance primitives library. It may be doing FFT based convolution or straightforward simple arithmetic depending on the arguments.
~stratum~

Post

bountyhunter2010 wrote:thank you guys very much!
How is it coming, bountyhunter, you've picked a topic with some serious depth. Good for you. If you pull it off your understanding of DSP will be much deeper.

What you are attempting is extremely useful for applying arbitrary linear transformations. Something I really need. Easy to find in VST, not so easy to find in AU.

Post Reply

Return to “DSP and Plugin Development”