|
|||
Hi, I've had a search but can't find what I'm looking for...
I'm working on a 'formant shifter' model that will be an MSP object. I currently receive data in the frequency domain from Max/MSP (using the pfft~ object), into my object. However, I'm struggling to accurately extract the spectral envelope of the voiced sounds. I think that cepstral smoothing will help with this by enabling me to 'lifter' the non-harmonic content out of the signal for analysis. This is where I'm getting lost! I have done a lot of searching but I can't work out how I can do this transform and 'liftering' process within my object. I have limited maths knowledge (hence why I don't understand much of the info available online), and am really after a simple explanation of how this could be implemented in C++. Thanks in advance for any help... PS: This is for an undergraduate university project so any help will be cited etc. |
|||
| ^ | Joined: 06 Apr 2010 Member: #229253 | ||
|
|||
Cepstral smoothing is a good way to get the spectral envelope. JOS provides some Matlab example code:
https://ccrma.stanford.edu/~jos/sasp/Spectral_Envelope_Cepst ral_Windowing.html LPC also works well, and he gives an example for that too. https://ccrma.stanford.edu/~jos/sasp/Spectral_Envelope_Linea r_Prediction.html If you need help translating the Matlab code (essentially DSP pseudo-code here) into C++, I can assist. |
|||
| ^ | Joined: 16 Feb 2011 Member: #250612 | ||
|
|||
Hi, thanks for the link and your help. I came across this earlier but to be honest the code and functions don't make much sense to me. I have a rough understanding of the process and concept, however I'm not sure what libraries and/or equivalent functions to use - it looks like the matlab code uses some functions that I don't have at the moment...
Thanks again - any assistance is much appreciated! Adib |
|||
| ^ | Joined: 06 Apr 2010 Member: #229253 | ||
|
|||
Well, you kind of have to look at the Matlab like DSP pseudocode, and implement a lot of the functions by hand, like so (the following C++ code will probably not compile):
#include <math.h> #define PI 3.1415926535897932384626433832795 // Your audio signal and its size. int size; float audioSignal[size]; // Frame size int Nframe = 2048; // Create hamming window, see http://en.wikipedia.org/wiki/Window_function#Hamming_window float w[Nframe]; for (int n = 0; n < Nframe; n++) { w[n] = 0.54 - 0.46*cos( 2.0*PI*n/(Nframe - 1) ); // Window the signal too audioSignal[n] *= w[n]; } // Factor of 4 zero padding int Nfft = 4*Nframe; // You'll have to use an FFT library here. The FFT result should be stored in sspec. float sspec[Nfft]; fft(audioSignal, Nfft, sspec); // Take log spectrum float dbsspecfull[Nfft]; for (int n = 0; n < Nfft; n++) { dbsspecfull[n] = 20*log(abs(sspec[n])); } // Real cepstrum, again you'll need an FFT library for this float rcep[Nfft]; ifft( dbsspecfull, rcep ); And so on... let me know if there are other steps you don't understand later in the code. Hopefully that gives you an idea though. |
|||
| ^ | Joined: 16 Feb 2011 Member: #250612 | ||
|
|||
Hi, thanks. It's making a bit more sense now. I've found what looks like a good fft library (?) from here http://courseware.ee.calpoly.edu/~jbreiten/C/ . I've been using a moving average on the frame to smooth the bin amplitudes - this also creates quite a smooth accurate spectrum, although takes a bit of experimenting to find the fine line between a smooth spectrum and missing data points! Thanks again,
Adib |
|||
| ^ | Joined: 06 Apr 2010 Member: #229253 | ||
|
|||
kissfft is normally my library of choice when I want something reasonably fast and easy to integrate with.
Are you doing a running average on the bin amplitudes? That should work OK but the smoothed cepstrum or LPC will give a better spectral envelope. |
|||
| ^ | Joined: 16 Feb 2011 Member: #250612 | ||
|
|||
Cool, I'll have a look at that library.
noisetteaudio wrote: Are you doing a running average on the bin amplitudes? That should work OK but the smoothed cepstrum or LPC will give a better spectral envelope. Thanks, |
|||
| ^ | Joined: 06 Apr 2010 Member: #229253 | ||
|
|||
I have a question. Why smooth in the cepstral domain? Why not just use a smaller window in the time domain? Okay that was two questions. |
|||
| ^ | Joined: 11 Jul 2004 Member: #32838 Location: Southern California, USA | ||
|
|||
MackTuesday wrote: I have a question. Why smooth in the cepstral domain? My current smoothing is in the frequency domain. The cepstrum transform seems to generate quite a 'bumpy' envelope (with a rectangular window), so smoothing with a more suitable window seems to remove these and would therefore (hopefully) leave only the main peaks (I think - I'm new to this myself and may have completely misunderstood - apologies if so).
MackTuesday wrote: Why not just use a smaller window in the time domain?
Because I want a large fft size for the bin size to be as low as possible. I'm smoothing in the frequency domain because I want all the small bin size benefits for modifying the signal, yet also want a smooth spectral envelope for analysis and peak detection. |
|||
| ^ | Joined: 06 Apr 2010 Member: #229253 | ||
|
|||
dibble wrote: MackTuesday wrote: I have a question. Why smooth in the cepstral domain? My current smoothing is in the frequency domain.Right. I should have said, Why smooth *using* the cepstral domain. Quote: MackTuesday wrote: Why not just use a smaller window in the time domain?
Because I want a large fft size for the bin size to be as low as possible. Have you tried zero-padding the input to your fft? In other words, take your windowed signal, then add a bunch of zeros to the vector, and take the fft of that? Hm, maybe you don't want that because you want as many actual data points as possible in your vector? I wonder how my simpler approach would affect the fidelity of your results. (I hope it's clear that I'm simply interested in the theory involved with your work.) |
|||
| ^ | Joined: 11 Jul 2004 Member: #32838 Location: Southern California, USA | ||
|
|||
Quote: Have you tried zero-padding the input to your fft? In other words, take your windowed signal, then add a bunch of zeros to the vector, and take the fft of that? Hm, maybe you don't want that because you want as many actual data points as possible in your vector? I wonder how my simpler approach would affect the fidelity of your results. (I hope it's clear that I'm simply interested in the theory involved with your work.) Hi, apologies in the delay - I've been very busy with work! The zero padding may work but there are two main issues. As you mentioned - it will reduce the number of actual data points per frame (and still be just as expensive). The other issue is that I'm currently doing my fft using Max/MSP (with pfft~), and the focus of this project is building an MSP external. Eventually I will port this to a standalone app but for the moment I will be keeping it within the Max API. I think I could implement your idea using fft~ in place of pfft~ but I will have to look into that. I'll let you know how it goes when I've tried it. Thanks! Adib |
|||
| ^ | Joined: 06 Apr 2010 Member: #229253 |
| KVR Forum Index » DSP and Plug-in Development | All times are GMT - 8 Hours |
|
Printable version |
Disclaimer: All communications made available as part of this forum and any opinions, advice, statements, views or other information expressed in this forum are solely provided by, and the responsibility of, the person posting such communication and not of kvraudio.com (unless kvraudio.com is specifically identified as the author of the communication).
Powered by phpBB © phpBB Group















