Constant-Q transform for spectrum analyzers and spectograms instead of FFT?
-
FilterEverything FilterEverything https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=526607
- KVRer
- 12 posts since 31 Aug, 2021
Everyone working with spectrum analyzers probably noticed a particular problem with them: depending on what window size you pick, it either smears low frequencies or the time axis as a whole.
Even though I noticed that some analyzers don't always do it because they use some magic called the phase reassignment method, this works worse and worse the closer the signal resembles noise.
So it looks like there are some talented people willing to put effort into this problem but I don't think I've ever seen an analyzer that uses a transform that doesn't space the frequency bins linearly.
I don't think I have to capacity to come up with anything "novel", surely some actual developers already thought about this and decided that the DFT/FFT is still the best transform to use for visualisation.
Is there someone willing to explain why a constant Q transform is not worth it over the FFT? Is it too wasteful to compute? Would it simply not solve the time domain-frequency domain tradeoff or at least balance it considerably better? Would a spectogram using the constant Q transform just look bad?
Even though I noticed that some analyzers don't always do it because they use some magic called the phase reassignment method, this works worse and worse the closer the signal resembles noise.
So it looks like there are some talented people willing to put effort into this problem but I don't think I've ever seen an analyzer that uses a transform that doesn't space the frequency bins linearly.
I don't think I have to capacity to come up with anything "novel", surely some actual developers already thought about this and decided that the DFT/FFT is still the best transform to use for visualisation.
Is there someone willing to explain why a constant Q transform is not worth it over the FFT? Is it too wasteful to compute? Would it simply not solve the time domain-frequency domain tradeoff or at least balance it considerably better? Would a spectogram using the constant Q transform just look bad?
- KVRist
- 189 posts since 31 Oct, 2017
I hadn't come across constant Q transform before. After a brief rendezvous with GPT, it came up with this compelling reason not to do it in the general case of spectrum analysis:
"To get a musically useful Q (≈ 12) at 20 Hz with 44.1 kHz sampling you need a ~0.56 s window. Stage monitors and “fast” meters won’t accept half-a-second lag. Many real-time analysers simply live without precise sub-100 Hz resolution."
"To get a musically useful Q (≈ 12) at 20 Hz with 44.1 kHz sampling you need a ~0.56 s window. Stage monitors and “fast” meters won’t accept half-a-second lag. Many real-time analysers simply live without precise sub-100 Hz resolution."
-
FilterEverything FilterEverything https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=526607
- KVRer
- Topic Starter
- 12 posts since 31 Aug, 2021
Where do these numbers come from? Our hearing spans less than 10 octaves. If I wanted lets say a bin for every quarter tone, I would need ~256 (positive) frequency bins for that. Wouldn't that imply a window size of 512 samples is already quite reasonable?
- KVRist
- 362 posts since 1 Apr, 2009 from Hannover, Germany
Last time I checked a proper constant Q transform would just be quite heavy to compute, as there's no optimization possible comparable to FFT. So usually you'd be better off just doing a large enough FFT and then doing some smoothing on the result.
- KVRist
- 469 posts since 6 Apr, 2008
That seems wrong. A quarter tone at the bottom end of the human hearing range (20 Hz) is about 0.6 Hz. So your FFT size would be more like 64k @ 44.1kHz.FilterEverything wrote: Wed Jun 11, 2025 11:47 pm Where do these numbers come from? Our hearing spans less than 10 octaves. If I wanted lets say a bin for every quarter tone, I would need ~256 (positive) frequency bins for that. Wouldn't that imply a window size of 512 samples is already quite reasonable?
- KVRist
- 469 posts since 6 Apr, 2008
I got annoyed by the problem myself some while ago and did some state-of-the-art research, including a superficial look at the QFT. I ultimately settled with a simple bank of parallel IIR bandpass filters. For low frequency resolution (3rd octave in my case, which seems to be kind of standard for mixing applications), this can be implemented quite efficiently and competitively compared with FFT (using SIMD and maybe even multirate processing where the lower octaves would be filtered using a downsampled input). It does not scale well to significantly higher resolutions, though. Besides the optimal time-frequency resolution, another advantage is that by using minimum-phase filters, we can minimize the latency of each band (i.e. lower latency at higher frequencies). In FFT, you get a constant latency of half the window size throughout the whole spectrum.FilterEverything wrote: Wed Jun 11, 2025 6:53 pm Everyone working with spectrum analyzers probably noticed a particular problem with them: depending on what window size you pick, it either smears low frequencies or the time axis as a whole.
Even though I noticed that some analyzers don't always do it because they use some magic called the phase reassignment method, this works worse and worse the closer the signal resembles noise.
So it looks like there are some talented people willing to put effort into this problem but I don't think I've ever seen an analyzer that uses a transform that doesn't space the frequency bins linearly.
I don't think I have to capacity to come up with anything "novel", surely some actual developers already thought about this and decided that the DFT/FFT is still the best transform to use for visualisation.
Is there someone willing to explain why a constant Q transform is not worth it over the FFT? Is it too wasteful to compute? Would it simply not solve the time domain-frequency domain tradeoff or at least balance it considerably better? Would a spectogram using the constant Q transform just look bad?
It should be noted that some of the more advanced FFT-based analysers offer spectrum smoothing (like 1/3, 1/6, 1/12th octave) which should give you similar results to QFT but maybe less efficient in terms of CPU cycles / sample.
-
FilterEverything FilterEverything https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=526607
- KVRer
- Topic Starter
- 12 posts since 31 Aug, 2021
Oh so it can not really be implemented by using modified FFT algorithm that somehow redistributes the bins in a way that's more convenient for audio?hugoderwolf wrote: Thu Jun 12, 2025 6:00 am Last time I checked a proper constant Q transform would just be quite heavy to compute, as there's no optimization possible comparable to FFT. So usually you'd be better off just doing a large enough FFT and then doing some smoothing on the result.
I understand that if I wanted to have a quarter note resolution at low frequencies using conventional FFT, I would need a lot of bins because the bin next to 20Hz should be spaced by 20 times whatever the 24th root of 2 is and so on. That would indeed mean a large window size for an FFT.karrikuh wrote: Thu Jun 12, 2025 6:06 am That seems wrong. A quarter tone at the bottom end of the human hearing range (20 Hz) is about 0.6 Hz. So your FFT size would be more like 64k @ 44.1kHz.
I thought that with the constant Q transform I get to have that kind of tight spacing at lower frequencies and much more sparse spacing at high frequencies. As an example, the bins around 1kHz would be placed at 971Hz, 1kHz, 1029Hz. So there would not be a need for so many frequency bins because they are distributed more sensibly. I assume that less bins directly translate to a smaller window size. Maybe the CQT and FFT can't be conflated like that, I thought the CQT is also an FFT based transform. If that's not the case, that would explain why it's not popular.
Last edited by FilterEverything on Sun Jun 15, 2025 1:16 pm, edited 1 time in total.
- KVRAF
- 8476 posts since 12 Feb, 2006 from Helsinki, Finland
The good old N-per-octave (eg. 3rd octave, or 1/6 or 1/12, etc) is effectively a constant-Q transform (not invertible, but usually for analysis only we don't need that). While the basic version is just a biquad per band, you can reduce how much a tone bleeds to adjacent bands by increasing the order (although for visual purposes I suggest something like Bessel- rather than Butterworth-based design, so that a sliding note doesn't just jump from one band to the next. It gets a bit expensive at higher orders though (eg. order 8 bandpass filters for 12 per octave and 10 octaves is already 480 biquads).
Note that a high time-resolution at higher frequencies sounds great, until you realize that for most visual purposes it's actually way too quick and you'll end up putting a fairly long (eg. 200ms) RMS smoothing window on every band just so you get something you can meaningfully draw.
Another possibility is not to abandon FFT, but instead approximate constant-Q by multi-resolution FFT. Basically you pick some window-size, but then you only keep the top half of the full-rate transform. You downsample by a factor of two and then use the same window size for the half-rate data (so effectively it's twice as long in wallclock time), again keeping only the top-half of the transform and again downsampling to half (now quarter) rate for the next octave down. It's not a true constant-Q, but close enough in some cases.
Finally, it's possible to do N/octave analyzer with FFT at least approximately. Basically rather than designing an IIR filter for each band, you design a FIR and then you use convolution to compute the bands. The key observations to make this fast is to realize that (1) convolution in spectral domain is just multiplication and (2) if we only care about RMS amplitude as is usually the case, then FFT preserves the 2-norm (ie. you can compute RMS values in either time or frequency domain and in theory the result is the same, ignoring the impact of things like windows). You can combine this with the multi-resolution FFT approach to effectively use larger windows at lower octaves.
For what it's worth, my general purposes analyzer does a combination of 3rd octave (with order 6 or 8 filters per band, forgot what it was exactly) and then it has FFT mode (with zoom and spectral slope-weighting and what not) separately. The 3rd octave is better for getting an idea of the spectral balance while the FFT is a king for surgical analysis.
Note that a high time-resolution at higher frequencies sounds great, until you realize that for most visual purposes it's actually way too quick and you'll end up putting a fairly long (eg. 200ms) RMS smoothing window on every band just so you get something you can meaningfully draw.
Another possibility is not to abandon FFT, but instead approximate constant-Q by multi-resolution FFT. Basically you pick some window-size, but then you only keep the top half of the full-rate transform. You downsample by a factor of two and then use the same window size for the half-rate data (so effectively it's twice as long in wallclock time), again keeping only the top-half of the transform and again downsampling to half (now quarter) rate for the next octave down. It's not a true constant-Q, but close enough in some cases.
Finally, it's possible to do N/octave analyzer with FFT at least approximately. Basically rather than designing an IIR filter for each band, you design a FIR and then you use convolution to compute the bands. The key observations to make this fast is to realize that (1) convolution in spectral domain is just multiplication and (2) if we only care about RMS amplitude as is usually the case, then FFT preserves the 2-norm (ie. you can compute RMS values in either time or frequency domain and in theory the result is the same, ignoring the impact of things like windows). You can combine this with the multi-resolution FFT approach to effectively use larger windows at lower octaves.
For what it's worth, my general purposes analyzer does a combination of 3rd octave (with order 6 or 8 filters per band, forgot what it was exactly) and then it has FFT mode (with zoom and spectral slope-weighting and what not) separately. The 3rd octave is better for getting an idea of the spectral balance while the FFT is a king for surgical analysis.
- KVRist
- 189 posts since 31 Oct, 2017
At the risk of slightly derailing the topic, but it kinda being related because FFT and resolution...mystran wrote: Thu Jun 12, 2025 6:12 pm For what it's worth, my general purposes analyzer does a combination of 3rd octave (with order 6 or 8 filters per band, forgot what it was exactly) and then it has FFT mode (with zoom and spectral slope-weighting and what not) separately. The 3rd octave is better for getting an idea of the spectral balance while the FFT is a king for surgical analysis.
I'm using an FFT with an impulse to render filter responses. The sort of thing that Serum does for its filter display. It's useful for a large number of different filter types where computing some kind of Z transform would be a royal PITA.
As you might expect though, it loses resolution in the lower bins and can look a bit lumpy. Is there a way of getting a smoother more even visual with this technique? I'm figuring you'd be the one to know
- KVRAF
- 8476 posts since 12 Feb, 2006 from Helsinki, Finland
No magic bullet here really. For good resolution at low frequencies, you need a longer FFT window.JustinJ wrote: Fri Jun 13, 2025 8:47 pm As you might expect though, it loses resolution in the lower bins and can look a bit lumpy. Is there a way of getting a smoother more even visual with this technique? I'm figuring you'd be the one to know![]()
Last edited by mystran on Fri Jun 13, 2025 9:26 pm, edited 1 time in total.
- KVRist
- 189 posts since 31 Oct, 2017
Ah well, that's what I though. Worth an ask.mystran wrote: Fri Jun 13, 2025 8:57 pm No magic bullet here really. For good reasolution at low frequencies, you need a longer FFT window.
-
- KVRian
- 585 posts since 1 Jan, 2013 from Denmark
My analyser defaults to "constant-Q" mode using complex resonator banks, and as mentioned, that is actually reasonably fast compared to FFTs especially in the case of analysers where you have a large signal overlap from frame to frame.
Three notes: You probably want to clamp the lower frequencies, otherwise the lowest frequency time resolution drops to zero and basically never moves. For high frequencies, where the bandwidth increases, the "update rate" becomes so fast you probably want to smooth it somehow (like mystran said). Lastly, unless some fixed bands per octave map straight to your display pixel grid you may want instead to adjust the Q based on that - to avoid scalloping losses or post processing.
So in a way, you're back to square one but some differences remain:
1. there are IMO useful properties in the middle of the spectrum
2. the sliding evaluation computational advantage can be beneficial
3. IIR decay and windowing shapes also behave differently from FIR; the first has constant velocity on a log log spectrum which you may or may not like.
4. You can map a bank pixel perfect to a screen
Three notes: You probably want to clamp the lower frequencies, otherwise the lowest frequency time resolution drops to zero and basically never moves. For high frequencies, where the bandwidth increases, the "update rate" becomes so fast you probably want to smooth it somehow (like mystran said). Lastly, unless some fixed bands per octave map straight to your display pixel grid you may want instead to adjust the Q based on that - to avoid scalloping losses or post processing.
So in a way, you're back to square one but some differences remain:
1. there are IMO useful properties in the middle of the spectrum
2. the sliding evaluation computational advantage can be beneficial
3. IIR decay and windowing shapes also behave differently from FIR; the first has constant velocity on a log log spectrum which you may or may not like.
4. You can map a bank pixel perfect to a screen
- KVRAF
- 8476 posts since 12 Feb, 2006 from Helsinki, Finland
Actually what I said about my 3rd octave is wrong.Mayae wrote: Sun Jun 15, 2025 10:41 am My analyser defaults to "constant-Q" mode using complex resonator banks, and as mentioned, that is actually reasonably fast compared to FFTs especially in the case of analysers where you have a large signal overlap from frame to frame.
Looking at my source code, I currently run 31 bands (midband frequency 630Hz which puts the min/max frequencies at around 20Hz and 20kHz) with order 8 Chebychev type 2 prototype(!) low-pass (with -80dB stop-band ripple) which transforms into 8 biquads per band (so actual order is 16), so that's 248 (stereo) biquads total (apparently not even using my "SIMD SVF pipeline" .. that'd probably make it about 4 times faster, which should probably make it clear it's not that terrible of a CPU hog if I didn't even bother optimizing).
Earlier I was using Bessel filters of similar order, either looks quite fine in practice, the Cheb2 is just a bit better on the skirts and I think it might actually even be a bit rounder on the top, which is actually a good thing.
Like I mentioned above, with steeper filters if you design them with Butterworth, you'll end up with a bank that's a bit "too good" visually: when you sweep a sine-wave slowly, if the band-edges are too sharp, the frequency jumps from one band to the next sort of abruptly. By using a design with a more rounded cutoff (Bessel, Chebychev type 2) you'll get a wider transition where a sine-wave somewhere between two center frequencies gives a weighted measurement in both bands depending on the exact frequency of the sine (similar to what happens when you use simple order 2 band-pass filters), yet the higher order filters still bring down the skirts.
Each band then has a simple exponential moving average RMS (ie. square, do one-pole lowpass with 200ms time constant and convert to decibels with the power formula), because otherwise the higher bands are visually way too fast and we need to measure the level somehow anyway and this gives us nice "attack/release" dynamics for the meters with no additional work.
I run this thing (really all the analysis) in the GUI thread, so the load on audio threads is effectively zero: whenever my redraw timer says I should redraw, I go fetch the latest (raw input) data from the audio thread, then depending on what's open in the GUI either I'll do a new FFT frame or I'll feed the data to the filter bank, then redraw. In practice the cost is probably similar on average to the FFT analysis and both are negligible compared to how long the drawing code spends stroking an FFT plot.
Last edited by mystran on Thu Jun 19, 2025 11:30 pm, edited 1 time in total.
-
- KVRian
- 585 posts since 1 Jan, 2013 from Denmark
OK that's interesting, you're effectively implementing a window function on your "spectrum" using steep bandpass filters?
I stumbled upon the fact that if a window function has a simple DFT response, you can simply sum such N parallel complex resonators (with cf spaced +/- bandwidth or I guess "bins") and get effectively a very nice "windowed" IIR response with the same advantages. I'm not really sure if anyone out there has explained why it works for IIR as well, but logically it sort of makes sense.

I stumbled upon the fact that if a window function has a simple DFT response, you can simply sum such N parallel complex resonators (with cf spaced +/- bandwidth or I guess "bins") and get effectively a very nice "windowed" IIR response with the same advantages. I'm not really sure if anyone out there has explained why it works for IIR as well, but logically it sort of makes sense.
I found that adjusting the bandwidth to be the difference between the former "bin" yields a response without dips (or max of -3 dB like for a rectangular window). One can then use above trick with a flat top window coefficients or similar to never have dips.mystran wrote: Sun Jun 15, 2025 3:46 pm Like I mentioned above, with steeper filters if you design them with Butterworth, you'll end up with a bank that's a bit "too good" visually: when you sweep a sine-wave slowly, if the band-edges are too sharp, the frequency jumps from one band to the next sort of abruptly. By using a design with a more rounded cutoff (Bessel, Chebychev type 2) you'll get a wider transition where a sine-wave somewhere between two center frequencies gives a weighted measurement in both bands depending on the exact frequency of the sine (similar to what happens when you use simple order 2 band-pass filters), yet the higher order filters still bring down the skirts.
Yeah, for a 4k display I run 3840 * 7 (for nuttall window) complex resonators, ie. 54k complex multiplies per sample for stereo. That runs in 28% async on one core with AVX2.mystran wrote: Sun Jun 15, 2025 3:46 pm [...] so that's 248 (stereo) biquads total (apparently not even using my "SIMD SVF pipeline" .. that'd probably make it about 4 times faster, which should probably make it clear it's not that terrible of a CPU hog if I didn't even bother optimizing).
- KVRAF
- 8476 posts since 12 Feb, 2006 from Helsinki, Finland
Well, not really window in any meaningful sense except as far as steeper filters prevent bleed into adjacent bins sort of similar to how a better window would work.Mayae wrote: Mon Jun 16, 2025 11:04 am OK that's interesting, you're effectively implementing a window function on your "spectrum" using steep bandpass filters?
ps. Strictly speaking I'm not even really doing "constant Q" but rather "constant log-bandwidth" as in analog these two are the same, but Nyquist warping makes them two different with BLT. Fortunately it's relatively simple to just compute the two desired band-edges and build a bandpass that covers the range.
