KVR Audio

Fender19 · Post by **Fender19** » Fri Dec 02, 2022 3:59 pm

Basic FFT/spectral processing presents a catch-22 where the time resolution and frequency resolution are inversely related, i.e., more frequency resolution (via larger FFT frame size) produces less time resolution.

So, for audio processing purposes, how can we get closer to having BOTH?

I understand one technique is to split the signal into several FFTs of different sizes. That may work but may also introduce its own side effects (i.e, discontinuities in time response at FFT boundaries).

What about FFT frame overlap? For example, does a 75% overlap produce better time resolution than 50% - or does it produce more smearing?

How are spectral dynamics processors, for example, able to have fast response time yet process down to 20Hz? What techniques are being used to get this high time resolution AND frequency resolution?

S0lo · Post by **S0lo** » Fri Dec 02, 2022 6:34 pm

I'm relatively new to FFT. So I would also love to have sound answers to some of these questions above.

And, its not like adding frequency resolution is amazing at it either. Most of the resolution gets consumed by higher frequencies while lower frequencies are sparse. You'd need so much more than a few hundred taps/points to get something usable.

Things get even more demanding and complicated when you want to do FFT -> process -> IFFT.

Fender19 · Post by **Fender19** » Fri Dec 02, 2022 7:32 pm

S0lo wrote: ↑Fri Dec 02, 2022 6:34 pm Things get even more demanding and complicated when you want to do FFT -> process -> IFFT.

That's my use for it - audio processing (FFT > Process > iFFT). I'm not concerned with optimizing spectrum analyzers (FFT > Process > Display) although there may be commonalities.

mystran · Post by **mystran** » Fri Dec 02, 2022 11:38 pm

Fender19 wrote: ↑Fri Dec 02, 2022 3:59 pm Basic FFT/spectral processing presents a catch-22 where the time resolution and frequency resolution are inversely related, i.e., more frequency resolution (via larger FFT frame size) produces less time resolution.

This is not specific to FFT and it is generally known as the uncertainty principle. Yes, we're talking about the very same Heisenberg's uncertainty principle that is fundamental to quantum physics.

A signal simply cannot be localized in both time and frequency at the same time. You are free to choose different tradeoffs at different frequency ranges (eg. wavelet transforms), but for any given frequency you simply can not beat the Gabor limit. Compact support in both time and frequency at the same time is also impossible.

Fender19 · Post by **Fender19** » Sat Dec 03, 2022 2:04 am

mystran wrote: ↑Fri Dec 02, 2022 11:38 pm A signal simply cannot be localized in both time and frequency at the same time.

There a many FFT-based "spectral" processing plugins on the market these days - compressors, de-essers, resonance suppressors, etc. - that respond quickly yet also have detailed frequency resolution. They are doing something with their FFTs that provides both high frequency resolution AND fast response time. I'm trying to understand what that could be.

Maybe these are industry secrets but I thought I'd ask here for what known/common techniques may exist.

For example, what is the effect of using more overlap of FFT frames? Does 75% overlap produce better speed/time accuracy of processed signals vs. 50% overlap or does it have some other effect? I would think that the closer we get to 100% overlap the more accurate the processed result would be in time, yes/no? Will test it out.

Fender19 · Post by **Fender19** » Sat Dec 03, 2022 3:24 am

Fender19 wrote: ↑Sat Dec 03, 2022 2:04 am For example, what is the effect of using more overlap of FFT frames? Does 75% overlap produce better speed/time accuracy of processed signals vs. 50% overlap or does it have some other effect? I would think that the closer we get to 100% overlap the more accurate the processed result would be in time, yes/no? Will test it out.

I just tested a 75% overlap version of an FFT-based dynamics processor vs. a 50% overlap version - both using Hanning windows with proper scale factors.

Unless I did something wrong, a null test shows that both versions produce the exact same output from the same music (non-periodic) input. I did not expect that result - I expected better dynamic response from the 75% overlap processing. So I guess more overlap is not a technique for improving time response. (So, what does more overlap do besides increase CPU load)?

Music Engineer · Post by **Music Engineer** » Sat Dec 03, 2022 7:36 am

Fender19 wrote: ↑Fri Dec 02, 2022 3:59 pm What about FFT frame overlap? For example, does a 75% overlap produce better time resolution than 50% - or does it produce more smearing?

It will produce the exact same smearing - just more densely sampled. The smearing is a result of length of the window which I assume to be the same in both cases. Longer windows -> more time domain smearing. Shorter windows -> more frequency domain smearing. I think, increasing the overlap in the time domain can be thought of as being dual to zero-padding before taking the FFT - where I mean "dual" in the way it (artificially) increases the resolution. Zero-padding leads to a more densely sampled frequency spectrum - but the underlying spectrum as a continuous function of frequency is already smeared out on the continuous frequency axis due to the windowing - and no amount of zero-padding will undo that continuous frequency smearing. The higher frequency resolution is artificial - kinda like upscaling a pixel image, but only vertically. Likewise, more overlap artificially increases your time resolution without really giving you any more information - the horizontal counterpart, so to speak. Nevertheless, some amount of extra-overlap and zero-padding may be beneficial for further, subsequent analysis steps - but don't expect wonders. I say "extra"-overlap because I assume that there is some kind of minimum overlap (like 50% for Hann and triangular windows) that you will always want and the "extra" is the amount above that minimum.

There a many FFT-based "spectral" processing plugins ... that respond quickly yet also have detailed frequency resolution. They are doing something with their FFTs that provides both high frequency resolution AND fast response time. I'm trying to understand what that could be.

That would be very interesting to me, too. One technique to improve the results of a spectrogram analysis by the STFT is time-frequency reassignment:

https://en.wikipedia.org/wiki/Reassignment_method

Could be that some spectrogram based processors make use of this. Another way to post-process a spectrogram is to use it as input to build a sinusoidal model of the underlying signal which is also known as phase-vocoder analysis:

https://en.wikipedia.org/wiki/Phase_vocoder

I think of this as a higher level analysis step on top of the STFT/spectrogram analysis that lifts the signal representation to a form that is more easily digestible for humans and also more meaningfully editable from a musical perspective. Like: raw audio samples: low level, spectrogram: mid level, phase-vocoder data: high level. To get the phase vocoder data from a spectrogram, you will typically combine information from various STFT bins and frames - for example, to find a frequency with higher precision, you can do parabolic interpolation on magnitude values of adjacent bins (in the same frame) and/or compare phase data of adjacent frames (of the same bin). Then, there's also the partial-tracking/peak-continuation algorithm that looks at adjacent frames, to figure out what sinusoids are present, etc. In general terms, you kind of aggregate information from various bins and frames using various heuristics in order to get a fuller and more precise picture of what is actually going on in the signal in terms of your (assumed) sinusoidal model. Such a sinusoidal model, i.e. the assumption that your signal is composed as a sum of time-varying (!) sinusoids, may or may not be a good fit for your material at hand - but if it is, then retrieving the parameters of these sines can give you a representation of the signal that is in some sense better or more accurate than the rawer spectrogram representation. Some of the more advanced time-stretching/pitch-shifting techniques tend to use such representations a lot...I think.

There are also multiresolution techniques like wavelet analysis that give you higher time-resolution (and lower frequency resolution) at higher frequencies (where you typically need it more).

mystran · Post by **mystran** » Sat Dec 03, 2022 8:19 am

Fender19 wrote: ↑Sat Dec 03, 2022 2:04 am Maybe these are industry secrets but I thought I'd ask here for what known/common techniques may exist.

https://en.wikipedia.org/wiki/Uncertain ... processing

Fender19 · Post by **Fender19** » Sat Dec 03, 2022 10:51 pm

mystran wrote: ↑Sat Dec 03, 2022 8:19 am https://en.wikipedia.org/wiki/Uncertain ... processing

Thank for for the link but I'm not sure you understand what I'm asking in this thread.

The real and imaginary data "bins" produced by the DFT/FFT are sums (averages) of all the sine/cosine basis signals correlated with the input signal over the length of the DFT/FFT frame. Amplitude and time data input points are converted to amplitude and frequency data output points. Time localization for the entire frame's length of input audio samples are now represented by only 1 averaged point at the center of that FFT frame.

Why is this a problem?
If we try to build something like an audio spectral dynamics processor, and perform processing in the frequency domain, that 1 magnitude value per 2048 input samples is not good enough - it's too slow (and also a simple average vs. RMS, etc.). If we apply frame-to-frame dynamic gain or attenuation to that magnitude value those magnitude changes affect all 2048 samples as a block when re-synthesized back through the iFFT.

Overlapping 2048 sample FFT/iFFT frames, with the proper windows, can produce a smooth dynamics transition from one output frame to the next but 1024 samples between dynamic updates would make a sluggish dynamics processor. It may be OK for slow "leveling" of audio dynamics but terrible for fast "compressing", de-noising, de-essing, etc.

In order to get faster dynamic response we have to either use smaller FFT frame sizes (which we can't if we want to process all the way down to 20Hz) - or use some "tricks" to extract better time localization. Given a time-varying input like audio I assumed more FFT frame overlap (towards "granular" processing) would help but my simple test shows it doesn't (I still think it should and will re-test this).

I know it is possible to process all the way down to 20Hz with FFT means and also have fast response time because I own plugins that do this. My question in this thread is HOW are they doing it - what are the "tricks"?

If you're not familiar with what I'm talking about here are some examples:
https://oeksound.com/plugins/soothe2/
https://www.google.com/url?sa=t&rct=j&q ... 7DzXYoBoaL
https://www.meldaproduction.com/MSpectralDynamicsLE

Thanks everyone for your inputs and links. I thought techniques to achieve both speed AND high resolution from FFT processing might be common knowledge but perhaps its proprietary. I will give this some more study and tests.

Music Engineer · Post by **Music Engineer** » Sat Dec 03, 2022 11:44 pm

Fender19 wrote: ↑Sat Dec 03, 2022 10:51 pm If we try to build something like an audio spectral dynamics processor ...

If you're not familiar with what I'm talking about here are some examples:
https://oeksound.com/plugins/soothe2/

I really like the audio examples of soothe. It seems to make everything sound better - more spectrally balanced. Of course, we can only guess what they are doing. I may throw in a wild guess: Maybe the plugin could be built around an adaptive filtering algorithm rather than spectrogram processing? What they are doing there somehow smells a bit like a "whitening filter" process like the ones that occur in linear prediction error filters. That's pure speculation, but this is an idea, I might try if I wanted build such a thing. The "Attack" parameter could control a learning rate and the "Release" parameter a forgetting rate in some sort of (N)LMS algorithm or one of its more sophisticated relatives (like recursive least squares (RLS), gradient adaptive lattice (GAL), least squares lattice (LSL), fast transversal filter (FTF)).

mystran · Post by **mystran** » Sun Dec 04, 2022 12:14 am

Fender19 wrote: ↑Sat Dec 03, 2022 10:51 pm Thanks everyone for your inputs and links. I thought techniques to achieve both speed AND high resolution from FFT processing might be common knowledge but perhaps its proprietary. I will give this some more study and tests.

Perhaps they are not using FFT? That's not the only way to do spectral processing.

FFT will give you uniform time and frequency resolution, but like I said above, you're free to use different time/frequency tradeoffs at different frequencies by using other methods. This is the case with wavelet transforms or constant-Q filterbanks for example, where you get high frequency resolution at low frequencies (where you almost certainly don't need as much temporal resolution) and high temporal resolution at high frequencies (where you don't need as much frequency resolution). Another possibility is to split the spectrum into multiple bands and use different FFT sizes for different frequency ranges. There are likely many other things you can do that I just can't think of right now.

With constant-Q filterbanks (eg. 3rd octave or whatever) you don't even need any "frames" and if you choose minimum-phase filters for analysis, then the "attack" of an RMS type detector is faster than the "decay" which is another potentially useful property for a dynamics processor and you could then use the detected envelopes to drive a matching bank of peaking EQs. This is probably what I'd try to do if I wanted to build a spectral dynamics processor.

I don't know how the plugins you mention actually work, but the point is that while you can't beat Gabor limit at any given frequency, you can think outside the box.

quikquak · Post by **quikquak** » Sun Dec 04, 2022 3:41 pm

What about FFT frame overlap? For example, does a 75% overlap produce better time resolution than 50% - or does it produce more smearing?

The 50% overlapped FFT gives you all the information you'll need, overlapping it any more won't yield more information. I think people use more overlaps to compensate for the windowing artefacts, but I find it really just blurs any problems, which is the wrong approach.
If you alter the bins then you have to keep track of, and manipulate the complex numbers for it to be coherent between frames.

signalsmith · Post by **signalsmith** » Sun Dec 04, 2022 10:35 pm

quikquak wrote: ↑Sun Dec 04, 2022 3:41 pm I think people use more overlaps to compensate for the windowing artefacts

To get perfect reconstruction, your choice of window is determined by the overlap factor. So, one perspective is that using more overlap allows people to use smoother windows which give a cleaner spectral analysis.

But there are problems with resynthesis as well. A simple example is: take a pure DC input, and process it with FFT overlap-add, removing everything apart from bin 0 (the DC bin). Depending on your choice of overlap and window functions (for analysis and synthesis) you'll get a varying amount of energy in other frequencies in the output.

This is a form of aliasing, and is best understood from the "downsampled filter-bank" perspective on the STFT (where the window functions actually turn out to be acting as FIR filters for a downsample/upsample, and the overlap ratio determines how extreme the downsample is).

Winkie · Post by **Winkie** » Mon Dec 05, 2022 8:35 pm

Fender19:

It seems odd that your initial post appeared so close in the forum display to a post which attempts to address some of the same concerns, albeit from a different tack. I don't know if you would need to translate the MATLAB code available at the adjacent post's website, or, entirely even, if it works as it is intended, but it does aim to accomplish the "impossible." The process is called restorative up-sampling (MATLAB code and description available at https://deserdi.xyz), and it appears on a webpage addressing holographic physics (ala David Bohm and String-theory). Again, not sure if this addresses your concerns, or, to be honest, if it even works, but perhaps worth a go.

-Winkie

Winkie · Post by **Winkie** » Tue Dec 06, 2022 4:41 pm

Found a relatively minor error in the MATLAB code. Don't know if this made a difference in the outputs, but the sound files and MATLAB code were updated mid-afternoon on Monday, Dec. 5th (MST).

-Thanks, Winkie

Techniques to increase FFT time resolution without sacrificing frequency resolution?