Efficient real time spectrum graphing techniques?
-
- KVRian
- Topic Starter
- 626 posts since 30 Aug, 2012
A lot of plugins these days - especially EQ plugins - include real time spectrum graphs in their display. Many of these plugins claim to be "light on resources" yet, AFAIK, FFTs are the only way to generate spectrum data - and FFTs are NOT light on resources!
So, are there less computationally-intensive means being used to generate these spectrum graphs?
So, are there less computationally-intensive means being used to generate these spectrum graphs?
-
- Banned
- 12368 posts since 30 Apr, 2002 from i might peeramid
?? all of my audio dsp has been done on single cores. i use the fft from dspguide.com, with some obvious optimisation. its not as fast as the common libraries for sure. but eg. for a 2x overlap, without much else going on i'd expect it to stay under 6% cpu. and of course for a graphic display the rate can be dropped some. i threw a 256 window display on the last standalone generator i coded, iirc it just captured a window every 16th note and so was basically invisible on cpu.
you come and go, you come and go. amitabha neither a follower nor a leader be tagore "where roads are made i lose my way" where there is certainty, consideration is absent.
-
- KVRian
- Topic Starter
- 626 posts since 30 Aug, 2012
I guess I am thinking of processors like noise reduction where FFT heavily taxes CPUs. Perhaps it's all the computation before and after the FFT/iFFT conversions that causes that load.xoxos wrote: ↑Mon Aug 12, 2019 12:47 am ?? all of my audio dsp has been done on single cores. i use the fft from dspguide.com, with some obvious optimisation. its not as fast as the common libraries for sure. but eg. for a 2x overlap, without much else going on i'd expect it to stay under 6% cpu. and of course for a graphic display the rate can be dropped some. i threw a 256 window display on the last standalone generator i coded, iirc it just captured a window every 16th note and so was basically invisible on cpu.
So, keep the sample rate and resolution low and FFTs are not a problem for graphing. Will check out dspguide.com Thanx!
- KVRist
- 323 posts since 19 Jul, 2008
>FFTs are NOT light on resources
FFTs are very light on resources. For `float[256]` blocks, the computation of an FFT is <1ns per sample, which is faster than accessing the elements in memory, so it's essentially a free computation.
FFTs are very light on resources. For `float[256]` blocks, the computation of an FFT is <1ns per sample, which is faster than accessing the elements in memory, so it's essentially a free computation.
VCV Rack, the Eurorack simulator
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
Another way to generate a spectrum graph is to use a bank of band-pass filters, but using an FFT is generally faster.Fender19 wrote: ↑Mon Aug 12, 2019 12:36 am A lot of plugins these days - especially EQ plugins - include real time spectrum graphs in their display. Many of these plugins claim to be "light on resources" yet, AFAIK, FFTs are the only way to generate spectrum data - and FFTs are NOT light on resources!
So, are there less computationally-intensive means being used to generate these spectrum graphs?
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
Well.. for good resolution at low frequencies you might want a lot longer FFTs (eg. 4k or more) and then you end up with the problem that you have to overlap them to get a decent visual framerate..
But even then, drawing the resulting data is usually a whole lot more expensive than the FFTs.
-
- KVRian
- 607 posts since 6 Mar, 2005 from USA
If you don't care about phase, a way to make FFT decomposition even faster is to pre-allocate memory (that's a 2^N size) and fill it up with incoming samples in a ring fashion. Then "unfold" the FFT computation for this specific length so there's no loops (easier to write code that writes code to remove all the loops). This machine-written code that's specific to your loop length will be ugly as anything, but I've found it runs much, much faster than looped code written for the general-length case. If you really need that phase, you could recover it since the loop will add a linear phase proportional to the offset in the loop of your current sample.
On laptops that were out in 2006 I could do a 32k FFT on streaming data with a 30 fps screen update and still only use 4% of the processor core.
On laptops that were out in 2006 I could do a 32k FFT on streaming data with a 30 fps screen update and still only use 4% of the processor core.
-
- KVRian
- Topic Starter
- 626 posts since 30 Aug, 2012
4-6% CPU usage that some are stating here is not “light” IMO. Try to use a dozen plugins like that in a typical mixing scenario and see what happens!
What do you mean by “processes a sample in <1nS”? On what speed of machine, with what overlap, etc.?
Can you clarify?
-
- KVRian
- 1273 posts since 9 Jan, 2006
Perhaps, but I wouldn't keep a dozen plugins open all displaying spectral graphs at the same time and, presumably, your plugin will only perform these calculations while the interface is open.
- KVRist
- 323 posts since 19 Jul, 2008
4-6% CPU usage is definitely wrong. I don't know what FFT implementation they're using, but you should be getting <1ns/sample with FFT block sizes ranging from 2^5 to 2^18 with anything in the last 10 years that supports AVX. That'd be around 0.005% CPU at 44.1kHz.
VCV Rack, the Eurorack simulator
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
Something that should probably be mentioned is that a typical text-book radix-2 can easily be a factor of 100 (or more) slower than a properly optimised FFT library. Optimising twiddle computations (and using constant lookups for small blocks) is the first step. Completely unfolding (use split-radix) tends to only be profitable for small (sub-)blocks (eg. 64 or less). As the blocks get larger, you tend to become more and more memory bound and you should basically run the largest radix (probably either 4 or 8 in practice) you can fit into registers, then stride blocks as large as possible without trashing L1 cache.AnalogGuy1 wrote: ↑Tue Aug 13, 2019 11:42 pmThen "unfold" the FFT computation for this specific length so there's no loops (easier to write code that writes code to remove all the loops). This machine-written code that's specific to your loop length will be ugly as anything, but I've found it runs much, much faster than looped code written for the general-length case.
But seriously, unless you want to spend a ton of time learning about all the possible optimisations and then tuning it, you should really just use some existing FFT library.
- KVRian
- 872 posts since 6 Aug, 2005 from England
At first I thought 'Fender19' was thinking they need to be done every sample!
Dave Hoskins. http://www.quikquak.com
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
Another tip: if you don't need the spectrum data in DSP code (eg. conventional EQ where it's only used for visual display), then you can simply send the samples as-is into a wait-free queue and do all the FFT computations in the GUI thread. This way the audio processing cost is practically zero and even if you start running low on CPU, you'll only take a performance hit with GUI framerates rather than glitching the audio.
- KVRAF
- 4590 posts since 7 Jun, 2012 from Warsaw
Yep. It is also important to run spectrum in separate thread so it doesn't block audio stream and slow down audio buffer processing.
Blog ------------- YouTube channel
Tricky-Loops wrote: (...)someone like Armin van Buuren who claims to make a track in half an hour and all his songs sound somewhat boring(...)
Tricky-Loops wrote: (...)someone like Armin van Buuren who claims to make a track in half an hour and all his songs sound somewhat boring(...)
-
- KVRian
- Topic Starter
- 626 posts since 30 Aug, 2012