Using FFT to double length of audio clip
-
- KVRer
- Topic Starter
- 3 posts since 25 May, 2015
Hi, I hope this isn't off topic. I've seen FFT questions here before. I want to double the size of a short, like 1/10 second, audio clip while preserving the frequency spectrum using:
1. Perform Fast Fourier Transform.
2. Double the size of the result, using a simple resample function:
f[new] = f[old/2];
Do this for both real and imaginary parts
3. Perform inverse FFT.
It works, sort of. The wave forms are preserved, but the amplitudes scale from correct at the beginning, to zero in the middle to correct at the end of the now doubled clip. I can correct these amplitudes using a scaling function, but I'm wondering if there's something else I'm doing wrong like the way I resample my fft data.
Honestly, the FFT is a great black box to me. I know it has something to do with moving back and forth between amplitude and frequency domains, but how and why it works is not something I understand. If anybody has any good links on this subject, I'd appreciate it.
Thanks.
1. Perform Fast Fourier Transform.
2. Double the size of the result, using a simple resample function:
f[new] = f[old/2];
Do this for both real and imaginary parts
3. Perform inverse FFT.
It works, sort of. The wave forms are preserved, but the amplitudes scale from correct at the beginning, to zero in the middle to correct at the end of the now doubled clip. I can correct these amplitudes using a scaling function, but I'm wondering if there's something else I'm doing wrong like the way I resample my fft data.
Honestly, the FFT is a great black box to me. I know it has something to do with moving back and forth between amplitude and frequency domains, but how and why it works is not something I understand. If anybody has any good links on this subject, I'd appreciate it.
Thanks.
You do not have the required permissions to view the files attached to this post.
- KVRAF
- 12555 posts since 7 Dec, 2004
The DFT of a finite length sampled signal is only valid for that exact length of signal. If you change it, the DFT must also change.
This is similar to interpolation... How is the DFT to know how to fill in the missing data? You have to tell it what to do.
In other words, FT is not a solution to this problem. You end up with the same issues you had in the time domain, how to interpolate across known sections of the clip to fill in the unknown sections.
One of the most simple solutions is to cross blend, which is very similar to what the FT with "amplitude correction" (correcting the frequency domain data in some way) would do.
This is similar to interpolation... How is the DFT to know how to fill in the missing data? You have to tell it what to do.
In other words, FT is not a solution to this problem. You end up with the same issues you had in the time domain, how to interpolate across known sections of the clip to fill in the unknown sections.
One of the most simple solutions is to cross blend, which is very similar to what the FT with "amplitude correction" (correcting the frequency domain data in some way) would do.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.
-
- Banned
- 12368 posts since 30 Apr, 2002 from i might peeramid
i believe there will be similar issues, though i'd anticipate them being less severe: double window length for ifft, multiply phase x2. share results?
you come and go, you come and go. amitabha neither a follower nor a leader be tagore "where roads are made i lose my way" where there is certainty, consideration is absent.
-
- KVRer
- Topic Starter
- 3 posts since 25 May, 2015
Thanks for your responses. Aciddose, you're explanation is fascinating. Kind of an information theory concept. It's interesting that the FFT seems to "know" that it doesn't have enough information to give complete amplitudes, in successive repeating cycles. To me it's a very deep subject that I don't understand.
From a wave pattern generation point of view, I did look and see what generating very close frequencies looks like. They tend to cancel each other out periodically This is one second of 440 and 441 with 44100 sample rate.
So my filling algorithm would be expected to generate such patterns. I suppose there's something about doubling that makes the 50% cycling pattern with the FFT.
Wow pattern, clicks, and clunks are basically my issue. I want to replicate short, simple sound clips to arbitrary lengths without getting these features. I wonder if there's a mathematical test for clicks and pops. I've looked around and there are some explanations like sudden variations in amplitude, but it's no where near that simple.
Thanks.
From a wave pattern generation point of view, I did look and see what generating very close frequencies looks like. They tend to cancel each other out periodically This is one second of 440 and 441 with 44100 sample rate.
So my filling algorithm would be expected to generate such patterns. I suppose there's something about doubling that makes the 50% cycling pattern with the FFT.
Wow pattern, clicks, and clunks are basically my issue. I want to replicate short, simple sound clips to arbitrary lengths without getting these features. I wonder if there's a mathematical test for clicks and pops. I've looked around and there are some explanations like sudden variations in amplitude, but it's no where near that simple.
Thanks.
You do not have the required permissions to view the files attached to this post.
-
- KVRer
- Topic Starter
- 3 posts since 25 May, 2015
xoxos, I went to do what you're suggesting, but I don't know what you mean by "multiply phase x2." Do you mean instead ofdouble window length for ifft, multiply phase x2.
f[new] = f[old/2];
something like:
f[new] = f[old/2+1]; or some sort of alternating offset?
Thanks.
- KVRian
- 799 posts since 25 Apr, 2011
I don't think you're really going to gain anything by using FFT - I reckon you'd be better off cutting your input up into small overlapping snippets and then crossfading between them, repeating snippets as required to stretch the overall sound to the length you want. This is essentially what granular synthesis does.
You can also playback the snippets at different sample rates, so you effectively have separate control over pitch and time.
If you want to get fancy once it's basically working, you could try to adjust the overlaps when crossfading, in order to minimise any out-of-phase cancellation effects.
You can also playback the snippets at different sample rates, so you effectively have separate control over pitch and time.
If you want to get fancy once it's basically working, you could try to adjust the overlaps when crossfading, in order to minimise any out-of-phase cancellation effects.
-
- Banned
- 12368 posts since 30 Apr, 2002 from i might peeramid
i know it difficult to get used to because not used to it. xoxos mean what xoxos say.
here's pseudocode:
the_new_phase_of_a_thing = the_old_phase_of_a_thing * 2;
here's pseudocode:
the_new_phase_of_a_thing = the_old_phase_of_a_thing * 2;
you come and go, you come and go. amitabha neither a follower nor a leader be tagore "where roads are made i lose my way" where there is certainty, consideration is absent.
-
- Banned
- 12368 posts since 30 Apr, 2002 from i might peeramid
xoxos also say, remember first part of first thing me say too.
you come and go, you come and go. amitabha neither a follower nor a leader be tagore "where roads are made i lose my way" where there is certainty, consideration is absent.