KVR Audio

FilterEverything · Post by **FilterEverything** » Wed Jun 11, 2025 6:53 pm

Everyone working with spectrum analyzers probably noticed a particular problem with them: depending on what window size you pick, it either smears low frequencies or the time axis as a whole.

Even though I noticed that some analyzers don't always do it because they use some magic called the phase reassignment method, this works worse and worse the closer the signal resembles noise.
So it looks like there are some talented people willing to put effort into this problem but I don't think I've ever seen an analyzer that uses a transform that doesn't space the frequency bins linearly.

I don't think I have to capacity to come up with anything "novel", surely some actual developers already thought about this and decided that the DFT/FFT is still the best transform to use for visualisation.

Is there someone willing to explain why a constant Q transform is not worth it over the FFT? Is it too wasteful to compute? Would it simply not solve the time domain-frequency domain tradeoff or at least balance it considerably better? Would a spectogram using the constant Q transform just look bad?

JustinJ · Post by **JustinJ** » Wed Jun 11, 2025 10:06 pm

I hadn't come across constant Q transform before. After a brief rendezvous with GPT, it came up with this compelling reason not to do it in the general case of spectrum analysis:

"To get a musically useful Q (≈ 12) at 20 Hz with 44.1 kHz sampling you need a ~0.56 s window. Stage monitors and “fast” meters won’t accept half-a-second lag. Many real-time analysers simply live without precise sub-100 Hz resolution."

FilterEverything · Post by **FilterEverything** » Wed Jun 11, 2025 11:47 pm

Where do these numbers come from? Our hearing spans less than 10 octaves. If I wanted lets say a bin for every quarter tone, I would need ~256 (positive) frequency bins for that. Wouldn't that imply a window size of 512 samples is already quite reasonable?

hugoderwolf · Post by **hugoderwolf** » Thu Jun 12, 2025 6:00 am

Last time I checked a proper constant Q transform would just be quite heavy to compute, as there's no optimization possible comparable to FFT. So usually you'd be better off just doing a large enough FFT and then doing some smoothing on the result.

karrikuh · Post by **karrikuh** » Thu Jun 12, 2025 6:06 am

FilterEverything wrote: Wed Jun 11, 2025 11:47 pm Where do these numbers come from? Our hearing spans less than 10 octaves. If I wanted lets say a bin for every quarter tone, I would need ~256 (positive) frequency bins for that. Wouldn't that imply a window size of 512 samples is already quite reasonable?

That seems wrong. A quarter tone at the bottom end of the human hearing range (20 Hz) is about 0.6 Hz. So your FFT size would be more like 64k @ 44.1kHz.

karrikuh · Post by **karrikuh** » Thu Jun 12, 2025 6:42 am

FilterEverything wrote: Wed Jun 11, 2025 6:53 pm Everyone working with spectrum analyzers probably noticed a particular problem with them: depending on what window size you pick, it either smears low frequencies or the time axis as a whole.

Even though I noticed that some analyzers don't always do it because they use some magic called the phase reassignment method, this works worse and worse the closer the signal resembles noise.
So it looks like there are some talented people willing to put effort into this problem but I don't think I've ever seen an analyzer that uses a transform that doesn't space the frequency bins linearly.

I don't think I have to capacity to come up with anything "novel", surely some actual developers already thought about this and decided that the DFT/FFT is still the best transform to use for visualisation.

Is there someone willing to explain why a constant Q transform is not worth it over the FFT? Is it too wasteful to compute? Would it simply not solve the time domain-frequency domain tradeoff or at least balance it considerably better? Would a spectogram using the constant Q transform just look bad?

I got annoyed by the problem myself some while ago and did some state-of-the-art research, including a superficial look at the QFT. I ultimately settled with a simple bank of parallel IIR bandpass filters. For low frequency resolution (3rd octave in my case, which seems to be kind of standard for mixing applications), this can be implemented quite efficiently and competitively compared with FFT (using SIMD and maybe even multirate processing where the lower octaves would be filtered using a downsampled input). It does not scale well to significantly higher resolutions, though. Besides the optimal time-frequency resolution, another advantage is that by using minimum-phase filters, we can minimize the latency of each band (i.e. lower latency at higher frequencies). In FFT, you get a constant latency of half the window size throughout the whole spectrum.

It should be noted that some of the more advanced FFT-based analysers offer spectrum smoothing (like 1/3, 1/6, 1/12th octave) which should give you similar results to QFT but maybe less efficient in terms of CPU cycles / sample.

FilterEverything · Post by **FilterEverything** » Thu Jun 12, 2025 10:09 am

hugoderwolf wrote: Thu Jun 12, 2025 6:00 am Last time I checked a proper constant Q transform would just be quite heavy to compute, as there's no optimization possible comparable to FFT. So usually you'd be better off just doing a large enough FFT and then doing some smoothing on the result.

Oh so it can not really be implemented by using modified FFT algorithm that somehow redistributes the bins in a way that's more convenient for audio?

karrikuh wrote: Thu Jun 12, 2025 6:06 am That seems wrong. A quarter tone at the bottom end of the human hearing range (20 Hz) is about 0.6 Hz. So your FFT size would be more like 64k @ 44.1kHz.

I understand that if I wanted to have a quarter note resolution at low frequencies using conventional FFT, I would need a lot of bins because the bin next to 20Hz should be spaced by 20 times whatever the 24th root of 2 is and so on. That would indeed mean a large window size for an FFT.

I thought that with the constant Q transform I get to have that kind of tight spacing at lower frequencies and much more sparse spacing at high frequencies. As an example, the bins around 1kHz would be placed at 971Hz, 1kHz, 1029Hz. So there would not be a need for so many frequency bins because they are distributed more sensibly. I assume that less bins directly translate to a smaller window size. Maybe the CQT and FFT can't be conflated like that, I thought the CQT is also an FFT based transform. If that's not the case, that would explain why it's not popular.

mystran · Post by **mystran** » Thu Jun 12, 2025 6:12 pm

The good old N-per-octave (eg. 3rd octave, or 1/6 or 1/12, etc) is effectively a constant-Q transform (not invertible, but usually for analysis only we don't need that). While the basic version is just a biquad per band, you can reduce how much a tone bleeds to adjacent bands by increasing the order (although for visual purposes I suggest something like Bessel- rather than Butterworth-based design, so that a sliding note doesn't just jump from one band to the next. It gets a bit expensive at higher orders though (eg. order 8 bandpass filters for 12 per octave and 10 octaves is already 480 biquads).

Note that a high time-resolution at higher frequencies sounds great, until you realize that for most visual purposes it's actually way too quick and you'll end up putting a fairly long (eg. 200ms) RMS smoothing window on every band just so you get something you can meaningfully draw.

Another possibility is not to abandon FFT, but instead approximate constant-Q by multi-resolution FFT. Basically you pick some window-size, but then you only keep the top half of the full-rate transform. You downsample by a factor of two and then use the same window size for the half-rate data (so effectively it's twice as long in wallclock time), again keeping only the top-half of the transform and again downsampling to half (now quarter) rate for the next octave down. It's not a true constant-Q, but close enough in some cases.

Finally, it's possible to do N/octave analyzer with FFT at least approximately. Basically rather than designing an IIR filter for each band, you design a FIR and then you use convolution to compute the bands. The key observations to make this fast is to realize that (1) convolution in spectral domain is just multiplication and (2) if we only care about RMS amplitude as is usually the case, then FFT preserves the 2-norm (ie. you can compute RMS values in either time or frequency domain and in theory the result is the same, ignoring the impact of things like windows). You can combine this with the multi-resolution FFT approach to effectively use larger windows at lower octaves.

For what it's worth, my general purposes analyzer does a combination of 3rd octave (with order 6 or 8 filters per band, forgot what it was exactly) and then it has FFT mode (with zoom and spectral slope-weighting and what not) separately. The 3rd octave is better for getting an idea of the spectral balance while the FFT is a king for surgical analysis.

JustinJ · Post by **JustinJ** » Fri Jun 13, 2025 8:47 pm

mystran wrote: Thu Jun 12, 2025 6:12 pm For what it's worth, my general purposes analyzer does a combination of 3rd octave (with order 6 or 8 filters per band, forgot what it was exactly) and then it has FFT mode (with zoom and spectral slope-weighting and what not) separately. The 3rd octave is better for getting an idea of the spectral balance while the FFT is a king for surgical analysis.

At the risk of slightly derailing the topic, but it kinda being related because FFT and resolution...

I'm using an FFT with an impulse to render filter responses. The sort of thing that Serum does for its filter display. It's useful for a large number of different filter types where computing some kind of Z transform would be a royal PITA.

As you might expect though, it loses resolution in the lower bins and can look a bit lumpy. Is there a way of getting a smoother more even visual with this technique? I'm figuring you'd be the one to know

mystran · Post by **mystran** » Fri Jun 13, 2025 8:57 pm

JustinJ wrote: Fri Jun 13, 2025 8:47 pm As you might expect though, it loses resolution in the lower bins and can look a bit lumpy. Is there a way of getting a smoother more even visual with this technique? I'm figuring you'd be the one to know

No magic bullet here really. For good resolution at low frequencies, you need a longer FFT window.

JustinJ · Post by **JustinJ** » Fri Jun 13, 2025 9:05 pm

mystran wrote: Fri Jun 13, 2025 8:57 pm No magic bullet here really. For good reasolution at low frequencies, you need a longer FFT window.

Ah well, that's what I though. Worth an ask.

Mayae · Post by **Mayae** » Sun Jun 15, 2025 10:41 am

My analyser defaults to "constant-Q" mode using complex resonator banks, and as mentioned, that is actually reasonably fast compared to FFTs especially in the case of analysers where you have a large signal overlap from frame to frame.

Three notes: You probably want to clamp the lower frequencies, otherwise the lowest frequency time resolution drops to zero and basically never moves. For high frequencies, where the bandwidth increases, the "update rate" becomes so fast you probably want to smooth it somehow (like mystran said). Lastly, unless some fixed bands per octave map straight to your display pixel grid you may want instead to adjust the Q based on that - to avoid scalloping losses or post processing.

So in a way, you're back to square one but some differences remain:
1. there are IMO useful properties in the middle of the spectrum
2. the sliding evaluation computational advantage can be beneficial
3. IIR decay and windowing shapes also behave differently from FIR; the first has constant velocity on a log log spectrum which you may or may not like.
4. You can map a bank pixel perfect to a screen

mystran · Post by **mystran** » Sun Jun 15, 2025 3:46 pm

Mayae wrote: Sun Jun 15, 2025 10:41 am My analyser defaults to "constant-Q" mode using complex resonator banks, and as mentioned, that is actually reasonably fast compared to FFTs especially in the case of analysers where you have a large signal overlap from frame to frame.

Actually what I said about my 3rd octave is wrong.

Looking at my source code, I currently run 31 bands (midband frequency 630Hz which puts the min/max frequencies at around 20Hz and 20kHz) with order 8 Chebychev type 2 prototype(!) low-pass (with -80dB stop-band ripple) which transforms into 8 biquads per band (so actual order is 16), so that's 248 (stereo) biquads total (apparently not even using my "SIMD SVF pipeline" .. that'd probably make it about 4 times faster, which should probably make it clear it's not that terrible of a CPU hog if I didn't even bother optimizing).

Earlier I was using Bessel filters of similar order, either looks quite fine in practice, the Cheb2 is just a bit better on the skirts and I think it might actually even be a bit rounder on the top, which is actually a good thing.

Like I mentioned above, with steeper filters if you design them with Butterworth, you'll end up with a bank that's a bit "too good" visually: when you sweep a sine-wave slowly, if the band-edges are too sharp, the frequency jumps from one band to the next sort of abruptly. By using a design with a more rounded cutoff (Bessel, Chebychev type 2) you'll get a wider transition where a sine-wave somewhere between two center frequencies gives a weighted measurement in both bands depending on the exact frequency of the sine (similar to what happens when you use simple order 2 band-pass filters), yet the higher order filters still bring down the skirts.

Each band then has a simple exponential moving average RMS (ie. square, do one-pole lowpass with 200ms time constant and convert to decibels with the power formula), because otherwise the higher bands are visually way too fast and we need to measure the level somehow anyway and this gives us nice "attack/release" dynamics for the meters with no additional work.

I run this thing (really all the analysis) in the GUI thread, so the load on audio threads is effectively zero: whenever my redraw timer says I should redraw, I go fetch the latest (raw input) data from the audio thread, then depending on what's open in the GUI either I'll do a new FFT frame or I'll feed the data to the filter bank, then redraw. In practice the cost is probably similar on average to the FFT analysis and both are negligible compared to how long the drawing code spends stroking an FFT plot.

Mayae · Post by **Mayae** » Mon Jun 16, 2025 11:04 am

OK that's interesting, you're effectively implementing a window function on your "spectrum" using steep bandpass filters?

I stumbled upon the fact that if a window function has a simple DFT response, you can simply sum such N parallel complex resonators (with cf spaced +/- bandwidth or I guess "bins") and get effectively a very nice "windowed" IIR response with the same advantages. I'm not really sure if anyone out there has explained why it works for IIR as well, but logically it sort of makes sense.

mystran wrote: Sun Jun 15, 2025 3:46 pm Like I mentioned above, with steeper filters if you design them with Butterworth, you'll end up with a bank that's a bit "too good" visually: when you sweep a sine-wave slowly, if the band-edges are too sharp, the frequency jumps from one band to the next sort of abruptly. By using a design with a more rounded cutoff (Bessel, Chebychev type 2) you'll get a wider transition where a sine-wave somewhere between two center frequencies gives a weighted measurement in both bands depending on the exact frequency of the sine (similar to what happens when you use simple order 2 band-pass filters), yet the higher order filters still bring down the skirts.

I found that adjusting the bandwidth to be the difference between the former "bin" yields a response without dips (or max of -3 dB like for a rectangular window). One can then use above trick with a flat top window coefficients or similar to never have dips.

mystran wrote: Sun Jun 15, 2025 3:46 pm [...] so that's 248 (stereo) biquads total (apparently not even using my "SIMD SVF pipeline" .. that'd probably make it about 4 times faster, which should probably make it clear it's not that terrible of a CPU hog if I didn't even bother optimizing).

Yeah, for a 4k display I run 3840 * 7 (for nuttall window) complex resonators, ie. 54k complex multiplies per sample for stereo. That runs in 28% async on one core with AVX2.

mystran · Post by **mystran** » Mon Jun 16, 2025 2:11 pm

Mayae wrote: Mon Jun 16, 2025 11:04 am OK that's interesting, you're effectively implementing a window function on your "spectrum" using steep bandpass filters?

Well, not really window in any meaningful sense except as far as steeper filters prevent bleed into adjacent bins sort of similar to how a better window would work.

ps. Strictly speaking I'm not even really doing "constant Q" but rather "constant log-bandwidth" as in analog these two are the same, but Nyquist warping makes them two different with BLT. Fortunately it's relatively simple to just compute the two desired band-edges and build a bandpass that covers the range.

Constant-Q transform for spectrum analyzers and spectograms instead of FFT?