Hi, I've had a search but can't find what I'm looking for...

I'm working on a 'formant shifter' model that will be an MSP object. I currently receive data in the frequency domain from Max/MSP (using the pfft~ object), into my object. However, I'm struggling to accurately extract the spectral envelope of the voiced sounds. I think that cepstral smoothing will help with this by enabling me to 'lifter' the non-harmonic content out of the signal for analysis. This is where I'm getting lost!

I have done a lot of searching but I can't work out how I can do this transform and 'liftering' process within my object. I have limited maths knowledge (hence why I don't understand much of the info available online), and am really after a simple explanation of how this could be implemented in C++.

Thanks in advance for any help...

PS: This is for an undergraduate university project so any help will be cited etc.

11 posts

Page

**1**of**1**- KVRer
- 27 posts since 16 Feb, 2011

Cepstral smoothing is a good way to get the spectral envelope. JOS provides some Matlab example code:

https://ccrma.stanford.edu/~jos/sasp/Sp ... owing.html

LPC also works well, and he gives an example for that too.

https://ccrma.stanford.edu/~jos/sasp/Sp ... ction.html

If you need help translating the Matlab code (essentially DSP pseudo-code here) into C++, I can assist.

https://ccrma.stanford.edu/~jos/sasp/Sp ... owing.html

LPC also works well, and he gives an example for that too.

https://ccrma.stanford.edu/~jos/sasp/Sp ... ction.html

If you need help translating the Matlab code (essentially DSP pseudo-code here) into C++, I can assist.

- KVRer
- 6 posts since 6 Apr, 2010

Hi, thanks for the link and your help. I came across this earlier but to be honest the code and functions don't make much sense to me. I have a rough understanding of the process and concept, however I'm not sure what libraries and/or equivalent functions to use - it looks like the matlab code uses some functions that I don't have at the moment...

Thanks again - any assistance is much appreciated!

Adib

Thanks again - any assistance is much appreciated!

Adib

- KVRer
- 27 posts since 16 Feb, 2011

Well, you kind of have to look at the Matlab like DSP pseudocode, and implement a lot of the functions by hand, like so (the following C++ code will probably not compile):

And so on... let me know if there are other steps you don't understand later in the code. Hopefully that gives you an idea though.

- Code: Select all
`#include <math.h>`

#define PI 3.1415926535897932384626433832795

// Your audio signal and its size.

int size;

float audioSignal[size];

// Frame size

int Nframe = 2048;

// Create hamming window, see http://en.wikipedia.org/wiki/Window_function#Hamming_window

float w[Nframe];

for (int n = 0; n < Nframe; n++)

{

w[n] = 0.54 - 0.46*cos( 2.0*PI*n/(Nframe - 1) );

// Window the signal too

audioSignal[n] *= w[n];

}

// Factor of 4 zero padding

int Nfft = 4*Nframe;

// You'll have to use an FFT library here. The FFT result should be stored in sspec.

float sspec[Nfft];

fft(audioSignal, Nfft, sspec);

// Take log spectrum

float dbsspecfull[Nfft];

for (int n = 0; n < Nfft; n++)

{

dbsspecfull[n] = 20*log(abs(sspec[n]));

}

// Real cepstrum, again you'll need an FFT library for this

float rcep[Nfft];

ifft( dbsspecfull, rcep );

And so on... let me know if there are other steps you don't understand later in the code. Hopefully that gives you an idea though.

- KVRer
- 6 posts since 6 Apr, 2010

Hi, thanks. It's making a bit more sense now. I've found what looks like a good fft library (?) from here http://courseware.ee.calpoly.edu/~jbreiten/C/ . I've been using a moving average on the frame to smooth the bin amplitudes - this also creates quite a smooth accurate spectrum, although takes a bit of experimenting to find the fine line between a smooth spectrum and missing data points! Thanks again,

Adib

Adib

- KVRer
- 6 posts since 6 Apr, 2010

Cool, I'll have a look at that library.

Thanks,

Yes - it works quite well - the main problem is that if the average (n) is too large the 'peaks' can be 'shifted sideways' slightly (although looking at the example results of Julius O. Smith III the cepstrum seems to also do this - to the extent that the 'smoothed' cepstrum peak is right between two real peaks). An average of 3 works best with my data. I will test this against the cepstrum and try to establish the benefits for my scenario.noisetteaudio wrote:Are you doing a running average on the bin amplitudes? That should work OK but the smoothed cepstrum or LPC will give a better spectral envelope.

Thanks,

- KVRer
- 6 posts since 6 Apr, 2010

My current smoothing is in the frequency domain. The cepstrum transform seems to generate quite a 'bumpy' envelope (with a rectangular window), so smoothing with a more suitable window seems to remove these and would therefore (hopefully) leave only the main peaks (I think - I'm new to this myself and may have completely misunderstood - apologies if so).MackTuesday wrote:I have a question. Why smooth in the cepstral domain?

MackTuesday wrote:Why not just use a smaller window in the time domain?

Because I want a large fft size for the bin size to be as low as possible. I'm smoothing in the frequency domain because I want all the small bin size benefits for modifying the signal, yet also want a smooth spectral envelope for analysis and peak detection.

- KVRist
- 471 posts since 11 Jul, 2004, from Southern California, USA

dibble wrote:My current smoothing is in the frequency domain.MackTuesday wrote:I have a question. Why smooth in the cepstral domain?

Right. I should have said, Why smooth *using* the cepstral domain.

MackTuesday wrote:Why not just use a smaller window in the time domain?

Because I want a large fft size for the bin size to be as low as possible.

Have you tried zero-padding the input to your fft? In other words, take your windowed signal, then add a bunch of zeros to the vector, and take the fft of that? Hm, maybe you don't want that because you want as many actual data points as possible in your vector? I wonder how my simpler approach would affect the fidelity of your results. (I hope it's clear that I'm simply interested in the theory involved with your work.)

- KVRer
- 6 posts since 6 Apr, 2010

Have you tried zero-padding the input to your fft? In other words, take your windowed signal, then add a bunch of zeros to the vector, and take the fft of that? Hm, maybe you don't want that because you want as many actual data points as possible in your vector? I wonder how my simpler approach would affect the fidelity of your results. (I hope it's clear that I'm simply interested in the theory involved with your work.)

Hi, apologies in the delay - I've been very busy with work! The zero padding may work but there are two main issues. As you mentioned - it will reduce the number of actual data points per frame (and still be just as expensive). The other issue is that I'm currently doing my fft using Max/MSP (with pfft~), and the focus of this project is building an MSP external. Eventually I will port this to a standalone app but for the moment I will be keeping it within the Max API. I think I could implement your idea using fft~ in place of pfft~ but I will have to look into that. I'll let you know how it goes when I've tried it.

Thanks!

Adib