Constant-Q transform for spectrum analyzers and spectograms instead of FFT?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

FilterEverything wrote: Wed Jun 11, 2025 6:53 pm Everyone working with spectrum analyzers probably noticed a particular problem with them: depending on what window size you pick, it either smears low frequencies or the time axis as a whole.

Is there someone willing to explain why a constant Q transform is not worth it over the FFT? Is it too wasteful to compute? Would it simply not solve the time domain-frequency domain tradeoff or at least balance it considerably better? Would a spectogram using the constant Q transform just look bad?
1. You don't need to pick a single window size, you can use different window sizes across the spectrum. I publish a free analyzer that does this, but not sure if the rules of this forum allow me to share my own stuff.

2. Intel's FFT algorithm is very fast. I'm not aware of a constant-Q implementation that performs at this level. If one exists, I'd love to learn about it.

Math aside, the treble part of your spectrum wants a ~30ms window otherwise it will either look sluggish, or it will bounce up and down in reaction to a 30Hz saw wave that sounds constant in loudness to the human ear. But good luck distinguishing 20Hz from 21Hz using only 30ms of history, you need closer to 10x that in practice.

Post

getbwr1 wrote: Wed Jun 18, 2025 2:02 am 1. You don't need to pick a single window size, you can use different window sizes across the spectrum. I publish a free analyzer that does this, but not sure if the rules of this forum allow me to share my own stuff.
You can certainly talk about your own software (free or commercial) around here especially when it's related to the discussion. In fact, it's even fine to post an announcement thread into Effects or Instruments (or this dev forum if it's something like a DSP library) if you want user feedback or whatever as long as your software is somehow audio related. Just maintain good taste (eg. don't create a dozen threads in multiple sub-forums for the same thing) and it's probably fine.

As far as multiple window sizes, yeah, I kinda hinted at that above already, although you can certainly also do it without the downsampling that I was talking about earlier.
Math aside, the treble part of your spectrum wants a ~30ms window otherwise it will either look sluggish, or it will bounce up and down in reaction to a 30Hz saw wave that sounds constant in loudness to the human ear. But good luck distinguishing 20Hz from 21Hz using only 30ms of history, you need closer to 10x that in practice.
The bouncing should not be a significant issue as long as you use a decent window function. That said, I find that if you're just drawing the FFT result "as is" (without doing some dynamics to slow down the peak decay) then 30ms is really too fast visually... plus that's less than one frame at 30fps refresh rate and ideally you'd want at least some overlap on the windows in order not to miss any data.

Personally I find that if we try to find a window size that's not too busy visually, but is also not obviously smeared, the optimum is somewhere around 200ms (eg. 8k window at 44.1/48kHz). That's somewhat of a personal preference certainly, but anything faster just kinda looks very busy; it becomes hard to actually see what's going on and at that point you'd then have to do some additional decay dynamics to calm it down.

As far separating 20Hz and 21Hz though, your ears can't do that either [edit: also for emphasis, we do not need 1Hz resolution in order to tell the frequency of a single tone down to 1Hz, we can get that from phase or by looking at relative amplitudes of the adjacent bins; we just need such resolution if we want to tell if there's more than one tone with 1Hz separation]. If you take two frequencies that differ from each other by less than around the lower limit of hearing (ie. around 20Hz) then you won't hear too tones, you'll hear beating (ie. AM modulation). So actually if you increase the frequency resolution too much at low frequencies, you'll start moving information that we'll hear as temporal into the frequency spectrum instead, which might not be what you want. It's really curious actually how the lower limit of hearing and what is perceived as envelope vs. tone actually occur around the same frequency range, but that kinda makes sense if you consider the fact that our ears can't truly cheat the uncertainty principle.

So... for what it's worth, I think the whole idea that a spectrum analyzer should provide a surgical resolution down to the lowest base frequencies is sort of misguided if we're talking about analyzing music.

I actually like constant-Q (ie. N-per-band analyzers) for pretty much the opposite reason: with wider bands at the higher frequencies it gives you a more reasonable estimate of the total energy which is basically impossible to tell from a busy spectrum.

Anyway, the main point I want to make here is that even though everyone always starts from "losing frequency resolution at low frequencies is bad" and even though it does arguably look visually kinda bad with the mushy blob covering the whole lowest octave of hearing... it's not necessarily as much of a problem as we'd like to think if the goal is to visually estimate what we will actually hear.

Post

There's an efficient and well developed method to avoid bad compromises between good spectrograms at high and low frequencies: wavelet transforms. Choose a filterbank design and produce coefficients at different rates for different bands, decimating lower frequency components.

Post

AUTO-ADMIN: Non-MP3, WAV, OGG, SoundCloud, YouTube, Vimeo, Twitter and Facebook links in this post have been protected automatically. Once the member reaches 5 posts the links will function as normal.
mystran wrote: Wed Jun 18, 2025 3:03 am You can certainly talk about your own software (free or commercial) around here especially when it's related to the discussion. In fact, it's even fine to post an announcement thread into Effects or Instruments (or this dev forum if it's something like a DSP library) if you want user feedback or whatever as long as your software is somehow audio related. Just maintain good taste (eg. don't create a dozen threads in multiple sub-forums for the same thing) and it's probably fine.
Cool, here's my implementation (it's free): https://reflex-acoustics.com/products/r ... m-analyzer (https://reflex-acoustics.com/products/reflex-spectrum-analyzer)

It covers the audible spectrum from 10Hz-20kHz with 4 frequency bands, each using a 512-element FFT running the Intel IPP algo if on PC. At a sample rate of 44.1kHz, the highest band has a 12ms window (i.e. 512/44100 seconds), which is visually smoothed even at the fastest "time smoothing" setting to avoid looking glitchy. The lowest band is decimated 126x and has some zero-padding, for an effective window length of 340ms.
The display runs at 45fps on a fast CPU, and will automatically throttle downwards on slower machines.
The FFT bins are mapped into log-spaced buckets of 1 semitone width for display, which are spaced 0.6Hz apart at 10Hz or 1.2Hz apart at 20Hz.
If you run a sine sweep (e.g. using PluginDoctor), you'll see some minor aliasing between the four bands, which could be further reduced at the cost of additional CPU, but I don't think it's noticable when working with normal musical signals, and nobody's complained about it so far.

I made a version of this that calculates "true" frequencies for each bin by looking at their phase rotation rates (which is a very cool technology) but I couldn't figure out how to display the information in a way that looked good. I'll use that approach if I make a spectrogram-type display in the future.

Note, if I were doing this over from scratch, I'd probably use 3 1024-element FFTs instead of 4x512. I have another plugin in the works with a smaller spectrum analyzer that isn't the main focus of the device, and that one uses 2x1024.

An important lesson from building this, which I sense others in this forum already understand, is that you can't get arbitrarily high frequency resolution merely by having lots of bins close together... you also need to feed in a decent length of signal relative to the wavelengths you're trying to distinguish.

Post

getbwr1 wrote: Wed Jun 18, 2025 4:46 pm An important lesson from building this, which I sense others in this forum already understand, is that you can't get arbitrarily high frequency resolution merely by having lots of bins close together... you also need to feed in a decent length of signal relative to the wavelengths you're trying to distinguish.
This thread made it clear to me that there are a lot of implementations for analyzers, constant Q transforms are sometimes used but the reason it's not as popular as I thought it "should" be is because it's not the silver bullet that I thought it was.

I already knew larger window sizes are needed for lower frequencies. My problem with the simple FFT analyzers I've seen was that if I changed the window size, it indiscriminately changed the smearing for both low and high frequencies which I still don't think is a necessity. I didn't need a larger window size for high frequencies, I only needed larger window size for the lower frequencies.

About high frequencies still needing at least a certain length of window size for smoothing, surely this is not a problem for spectograms as opposed to real-time analyzers, right? I find spectograms very useful when I hear a cool sound in some music and I want to recreate something similar to it. It can give a decent starting point when you have no idea how to go about it just by listening to the sound.

Post

FilterEverything wrote: Thu Jun 19, 2025 8:47 am
getbwr1 wrote: Wed Jun 18, 2025 4:46 pm An important lesson from building this, which I sense others in this forum already understand, is that you can't get arbitrarily high frequency resolution merely by having lots of bins close together... you also need to feed in a decent length of signal relative to the wavelengths you're trying to distinguish.
This thread made it clear to me that there are a lot of implementations for analyzers, constant Q transforms are sometimes used but the reason it's not as popular as I thought it "should" be is because it's not the silver bullet that I thought it was.
One more thin I want to point out is that you'll actually also end up measuring slightly different things in some cases depending on whether your bin-spacing is linear or logarithmic.

Basically if you measure isolated sinusoids, then a standard FFT and constant-Q will give you same results: same amplitude, different frequency, the peak is the same just shifted around.

However, if you measure pink noise, then FFT will show you 3dB/octave decay and the constant-Q analyser will ideally measure flat.. and that's because in this case they are measuring different things: the energy of pink-noise over any given isolated frequency decays 3dB/octave (what the FFT analyser is showing you), but this means the average energy is the same for each octave (what the constant-Q analyser is showing you).

These are both "correct" results and the difference is because in the case of noise, we'll actually get some sort of average. FFT will average over fixed (linear) bandwidth bins, the constant-Q will average over bins with their (linear) bandwidth proportional to the center frequency... and as it turns out both results are useful depending on what you want to do with the result.

Post

Has anybody got any joy from using the Goertzel algorithm to calculate any bin of longer FFTs? It saves calculating a whole large FFTs for lower frequencies. Or is it still too slow to do Goertzel these days?

Post

quikquak wrote: Mon Jun 30, 2025 12:24 pm Has anybody got any joy from using the Goertzel algorithm to calculate any bin of longer FFTs? It saves calculating a whole large FFTs for lower frequencies. Or is it still too slow to do Goertzel these days?
You probably want a more accurate recurrence (the cosine term is problematic at low frequencies), but in principle this is more efficient approximately when you want less than N bins (give or take a factor of two or so; not sure if the Goertzel approach really saves much over just computing recurrence and constant multiply since that'd be easier to SIMD) from a 2^N FFT.

Post

quikquak wrote: Mon Jun 30, 2025 12:24 pm Has anybody got any joy from using the Goertzel algorithm to calculate any bin of longer FFTs?
Yeah, Goertzel doesn't give you an advantage if you need too many of the bins. Each Goertzel bin is essentially a time-domain box-filter + complex modulation (to turn it into a bandpass). Having enough time-domain filters to build a spectrum display is quite heavy.

However, if you use a (short-windowed) STFT as a downsampled filter-bank first, you can run those filters inside a subsampled band, and they'll be more efficient because they're running at 200Hz or whatever instead of 48kHz.

Not 100% sure this demo works in all browsers, but try this out. Those filters are min-phase for simplicity, but you could use linear-phase ones (and compensate for different latency) to match FFT behaviour.
Last edited by signalsmith on Wed Jul 16, 2025 9:37 pm, edited 1 time in total.

Post

signalsmith wrote: Tue Jul 01, 2025 8:27 am
quikquak wrote: Mon Jun 30, 2025 12:24 pm Has anybody got any joy from using the Goertzel algorithm to calculate any bin of longer FFTs?
Yeah, Goertzel doesn't give you an advantage if you need too many of the bins. Each Goertzel bin is essentially a time-domain box-filter + complex modulation (to turn it into a bandpass). Having enough time-domain filters to build a spectrum display is quite heavy.

However, if you use a (short-windowed) STFT as a downsampled filter-bank first, you can run those filters inside a subsampled band, and they'll be more efficient because they're running at 200Hz or whatever instead of 48kHz.

Not 100% sure this demo works in all browsers, but try this out. Those filters are min-phase for simplicity, but you could use linear-phase ones (and compensate for different latency) to match FFT behaviour.
Thanks, interesting. I suppose Gorertzel is really only used for detecting specific notes or frequencies.

I've tried using multistage polyphase downsampling before - I was annoyed by the phase problems on reconstruction, so I didn't go back to it. It was fast though.

I think I'll stick with FFTs for now.

Post

quikquak wrote: Tue Jul 01, 2025 2:28 pm I've tried using multistage polyphase downsampling before
Just in case there's a misunderstanding: I wasn't talking about anything multistage. I'm doing ordinary FFT analysis with overlapping blocks.

If you look at the Nth bin from each spectrum over time, then you can consider those values as a complex signal. Here's 60s of hand-waving about it:

I'm then running "time domain" filters for each spectrum-display frequency, taking the input from one of those complex sub-bands. The lower bins/sub-bands are feeding several narrow filters at different frequencies, the higher bins have at most one very wide filter.

Post

signalsmith wrote: Tue Jul 01, 2025 3:18 pm I'm then running "time domain" filters for each spectrum-display frequency, taking the input from one of those complex sub-bands. The lower bins/sub-bands are feeding several narrow filters at different frequencies, the higher bins have at most one very wide filter.
So basically you're treating the FFT frames of a single bin as a bandpass sampled / decimated signal and then filtering it further?

Post

mystran wrote: Wed Jul 02, 2025 3:39 pm So basically you're treating the FFT frames of a single bin as a bandpass sampled / decimated signal and then filtering it further?
Yeah, exactly.

Those second-stage filters can't be wider than the bandwidth of an STFT sub-band (a.k.a. the FFT window function), but they can be narrower. So you start with whatever FFT size gets you the right high-end responsiveness and then slice the lower bands/bins up more finely in the second-stage.

That demo is actually using the Bark-scale for filter centres/bandwidth, but the same approach works equally well for logarithmic, Mel, etc.

Post

signalsmith wrote: Thu Jul 03, 2025 1:49 pm Those second-stage filters can't be wider than the bandwidth of an STFT sub-band (a.k.a. the FFT window function), but they can be narrower. So you start with whatever FFT size gets you the right high-end responsiveness and then slice the lower bands/bins up more finely in the second-stage.
This is the part I feel is a bit hand-wavy. If I'm not mistaken, what we really have in the FFT bin is a single time-sample of a bandpass filter the response of which depends on the chosen window function, so in practice the bandwidth of each sub-band (ie. FFT bin) is actually a bit more than the nominal width of the bin. How much wider depends on the window chosen, but this can be used as a design parameter.

Since the actual bandwidth is wider than the nominal bandwidth, we'll need to overlap blocks to avoid aliasing, but if we overlap sufficiently, then the second stage filter can actually be a bit wider than a bin, provided that we respect Nyquist (with respect to the block-rate) and take the first-stage bin-response into account. Right?

So as far as I can see, one should be able to intentionally design flat-top windows or something that spreads the bins a bit more than necessary in order to get a bit more flexibility in terms of how wide the second-stage filters can be, with the trade-off that we'll need more block overlap to avoid aliasing.

Post

@Mystran: Correct - when I said "the bandwidth of an STFT sub-band" I was referring to how those sub-bands have already been filtered by the STFT, which is generally much wider than the spacing between them (when using appropriate overlap/window-shape to get decent aliasing levels).

For maximum consistency, I always recommend specifying the window length (and STFT interval) in milliseconds, and then zero-padding to a reasonably-close fast FFT size. This means that the spacing between bands/bins will vary slightly depending on sample-rate, but the shape/width of the actual sub-band filters (and downsampled rate, and therefore aliasing/etc.) is consistent. The spacing between bands/bins is therefore decoupled from the spectral resolution / sub-band bandwidth - so the bin-/band-spacing is mostly only relevant for averaging energy across bands or something similar.

As you can interpolate between FFT bins to get fractional positions, you can also get a downsampled sub-band centred exactly on the frequency you want, as the input for each second-stage filter. Doing this interpolation properly guarantees that the peak of the overall filter is at 0dB and has a symmetrical shape and so on. A flat-top window as you mentioned will make even very simple interpolation provide that 0dB peak.

Post Reply

Return to “DSP and Plugin Development”