Cepstral Smoothing in C++ (and Max/MSP)

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

Hi, I've had a search but can't find what I'm looking for...

I'm working on a 'formant shifter' model that will be an MSP object. I currently receive data in the frequency domain from Max/MSP (using the pfft~ object), into my object. However, I'm struggling to accurately extract the spectral envelope of the voiced sounds. I think that cepstral smoothing will help with this by enabling me to 'lifter' the non-harmonic content out of the signal for analysis. This is where I'm getting lost!

I have done a lot of searching but I can't work out how I can do this transform and 'liftering' process within my object. I have limited maths knowledge (hence why I don't understand much of the info available online), and am really after a simple explanation of how this could be implemented in C++.


Thanks in advance for any help...


PS: This is for an undergraduate university project so any help will be cited etc.

Post

Cepstral smoothing is a good way to get the spectral envelope. JOS provides some Matlab example code:

https://ccrma.stanford.edu/~jos/sasp/Sp ... owing.html

LPC also works well, and he gives an example for that too.

https://ccrma.stanford.edu/~jos/sasp/Sp ... ction.html

If you need help translating the Matlab code (essentially DSP pseudo-code here) into C++, I can assist.

Post

Hi, thanks for the link and your help. I came across this earlier but to be honest the code and functions don't make much sense to me. I have a rough understanding of the process and concept, however I'm not sure what libraries and/or equivalent functions to use - it looks like the matlab code uses some functions that I don't have at the moment...

Thanks again - any assistance is much appreciated!

Adib

Post

Well, you kind of have to look at the Matlab like DSP pseudocode, and implement a lot of the functions by hand, like so (the following C++ code will probably not compile):

Code: Select all

#include <math.h>
#define PI 3.1415926535897932384626433832795
// Your audio signal and its size.
int size;
float audioSignal[size];
// Frame size
int Nframe = 2048;
// Create hamming window, see http://en.wikipedia.org/wiki/Window_function#Hamming_window
float w[Nframe];
for (int n = 0; n < Nframe; n++)
{
  w[n] = 0.54 - 0.46*cos( 2.0*PI*n/(Nframe - 1) );
  // Window the signal too
  audioSignal[n] *= w[n];
}
// Factor of 4 zero padding
int Nfft = 4*Nframe;
// You'll have to use an FFT library here.  The FFT result should be stored in sspec.
float sspec[Nfft];
fft(audioSignal, Nfft, sspec);
// Take log spectrum
float dbsspecfull[Nfft];
for (int n = 0; n < Nfft; n++)
{
  dbsspecfull[n] = 20*log(abs(sspec[n]));
}
// Real cepstrum, again you'll need an FFT library for this
float rcep[Nfft];
ifft( dbsspecfull, rcep );
And so on... let me know if there are other steps you don't understand later in the code. Hopefully that gives you an idea though.

Post

Hi, thanks. It's making a bit more sense now. I've found what looks like a good fft library (?) from here http://courseware.ee.calpoly.edu/~jbreiten/C/ . I've been using a moving average on the frame to smooth the bin amplitudes - this also creates quite a smooth accurate spectrum, although takes a bit of experimenting to find the fine line between a smooth spectrum and missing data points! Thanks again,

Adib

Post

kissfft is normally my library of choice when I want something reasonably fast and easy to integrate with.

Are you doing a running average on the bin amplitudes? That should work OK but the smoothed cepstrum or LPC will give a better spectral envelope.

Post

Cool, I'll have a look at that library.
noisetteaudio wrote: Are you doing a running average on the bin amplitudes? That should work OK but the smoothed cepstrum or LPC will give a better spectral envelope.
Yes - it works quite well - the main problem is that if the average (n) is too large the 'peaks' can be 'shifted sideways' slightly (although looking at the example results of Julius O. Smith III the cepstrum seems to also do this - to the extent that the 'smoothed' cepstrum peak is right between two real peaks). An average of 3 works best with my data. I will test this against the cepstrum and try to establish the benefits for my scenario.

Thanks,

Post

I have a question. Why smooth in the cepstral domain? Why not just use a smaller window in the time domain? Okay that was two questions.

Post

MackTuesday wrote:I have a question. Why smooth in the cepstral domain?
My current smoothing is in the frequency domain. The cepstrum transform seems to generate quite a 'bumpy' envelope (with a rectangular window), so smoothing with a more suitable window seems to remove these and would therefore (hopefully) leave only the main peaks (I think - I'm new to this myself and may have completely misunderstood - apologies if so).
MackTuesday wrote:Why not just use a smaller window in the time domain?
Because I want a large fft size for the bin size to be as low as possible. I'm smoothing in the frequency domain because I want all the small bin size benefits for modifying the signal, yet also want a smooth spectral envelope for analysis and peak detection.

Post

dibble wrote:
MackTuesday wrote:I have a question. Why smooth in the cepstral domain?
My current smoothing is in the frequency domain.
Right. I should have said, Why smooth *using* the cepstral domain.
MackTuesday wrote:Why not just use a smaller window in the time domain?
Because I want a large fft size for the bin size to be as low as possible.
Have you tried zero-padding the input to your fft? In other words, take your windowed signal, then add a bunch of zeros to the vector, and take the fft of that? Hm, maybe you don't want that because you want as many actual data points as possible in your vector? I wonder how my simpler approach would affect the fidelity of your results. (I hope it's clear that I'm simply interested in the theory involved with your work.)

Post

Have you tried zero-padding the input to your fft? In other words, take your windowed signal, then add a bunch of zeros to the vector, and take the fft of that? Hm, maybe you don't want that because you want as many actual data points as possible in your vector? I wonder how my simpler approach would affect the fidelity of your results. (I hope it's clear that I'm simply interested in the theory involved with your work.)


Hi, apologies in the delay - I've been very busy with work! The zero padding may work but there are two main issues. As you mentioned - it will reduce the number of actual data points per frame (and still be just as expensive). The other issue is that I'm currently doing my fft using Max/MSP (with pfft~), and the focus of this project is building an MSP external. Eventually I will port this to a standalone app but for the moment I will be keeping it within the Max API. I think I could implement your idea using fft~ in place of pfft~ but I will have to look into that. I'll let you know how it goes when I've tried it.

Thanks!

Adib

Post Reply

Return to “DSP and Plugin Development”