KVR Audio

avasopht · Post by **avasopht** » Mon Sep 15, 2014 2:14 pm

UPDATED
Optimising (for interleaved / padded input) FFT in progress. Basically you can select reading interleaved samples, or the first set of samples to the first/last set of frequencies as well as seeing gains in rendering IFFT from the first / last set of frequencies to interleaved or the first set of samples. Unnecessary computational paths will be eliminated.

This is all useful for up/down sampling as well as overlap-add.

ORIGINAL QUESTION (edited for clarity) ---

Example code: kfft.c

I wanted to know whether there is a similar FFT algorithm to mine.

I basically perform FFT by factorising calculations by recursively subtracting the second half of a signal from the first, meaning x[n] only needs to be multiplied by the complex sinusoid for only half of the duration, resulting in N/2 log N real multiplies (shown below). The summed signal is for multiplying with complex sinusoids that are multiples of the sampling period. For example groups {2, 6} are multiples of 2, and therefore have a periodicity of 4 samples.

Code: Select all

// For bins { 1, 3, 5 and 7 }

x1[0] = x0[0] - x0[4]
x1[1] = x0[1] - x0[5]
x1[2] = x0[2] - x0[6]
x1[3] = x0[3] - x0[7]
// Calculate sum (required for the next level of bins)
x0[0] = x0[0] + x0[4]
x0[1] = x0[1] + x0[5]
x0[2] = x0[2] + x0[6]
x0[3] = x0[3] + x0[7]

// For bins { 2 and 6 }

x2[0] = x0[0] - x0[2]
x2[1] = x0[1] - x0[3]
// Calculate sum (required for the next level of bins)
x0[2] = x0[0] + x0[2]
x0[3] = x0[1] + x0[3]

// For bin { 4 }
x3[0] = x0[0] - x0[1]
x3[1] = x0[0] + x0[1] // <-- This is the 0hz bin (sum)

avasopht · Post by **avasopht** » Mon Sep 15, 2014 2:29 pm

Okay so my first response says it's nothing new. So what is this algorithm called?

Miles1981 · Post by **Miles1981** » Mon Sep 15, 2014 3:04 pm

The DFT is the general Discrete Fourier Transform. By default, it has an O(n²) complexity. The FFT is the Fast Fourier Transform, witha complexity of O(n ln(n)). It is achieved by factoring elements together (2 together is the usual radix-2 approach, more or less what you proposed).
Can you check what are your differences with the butterfly operator in the usual FFT? There is another way of computing a real to complex DFT in place (less memory used) which is the Bergland algorithm. Also in O(n ln(n)).

avasopht · Post by **avasopht** » Mon Sep 15, 2014 3:55 pm

I'll compare with the butterfly operator tomorrow but it seems to essentially be doing the same thing.

This can compute in place also (just working out the maths).

Miles1981 · Post by **Miles1981** » Mon Sep 15, 2014 4:12 pm

Yes, it's the basic element of the FFT algorithm.

mystran · Post by **mystran** » Tue Sep 16, 2014 10:49 am

avasopht wrote:I'll compare with the butterfly operator tomorrow but it seems to essentially be doing the same thing.

This can compute in place also (just working out the maths).

As far as I can understand your FFT algorithm from the description, the math looks like a basic Cooley-Tukey radix-2 (the decimation in frequency variant, I guess). Not sure about the recursion and recombination steps, would need the full code (at least in pseudocode) to comment on that. In general though, there's a whole lot of known variations on this basic idea, so I doubt it's really anything new.

edit: upon closer reading .. have you tested this? In addition to the differences, you almost certainly need to recurse the sums as well (which leads to the "butterfly" that calculates a sum and difference and "twiddles" the difference, factoring out the common angle, so you get two identical sub-transforms). Maybe I just don't understand your description correctly, but I don't see how it could work with just the differences.

avasopht · Post by **avasopht** » Tue Sep 16, 2014 1:40 pm

Yes it would be decimation in frequency, though all calculations are with the complex vector against a real and as such can be executed with just n log n real *multiplies (minus n/8 log n with a minor optimisation). The ifft to this allows you to easily output selected frequencies to various outputs on the fly at the cost of N / 2^LSB(k) **copies for each frequency.

I'll work on some code ASAP to give you a better idea.

How it works is that there are a set of frequencies at PI halfway through the signal samples, meaning that the second half of those samples starts at an odd phase. So by subtracting the second half of the samples from the first you can save yourself from ever having to multiply by the PI-2PI portion of f.

The image below shows the basic case of a frequency that cycles over the sample period once. Basically by subtracing x[n+4] from x[n], you only need to multiply for half of the cycle.

Now we already know that we can make an immediate saving without this property anyway, by wrapping for each time that a cycle completes.

The diagram below shows how this works for any frequency wk that is at a half cycle halfway through a sampling period.

*: all other arithmetic can be done with shifts and bitmasks.
**: LSB(0) = 32

avasopht · Post by **avasopht** » Tue Sep 16, 2014 2:17 pm

Just added a diagram (should have made it into a new post).

Minor update: I've got part of the code working in Python, ... will have this FFT algo completed by tomorrow.

And slight miscalculation. This cannot be performed in place. The IFFT still can though.

avasopht · Post by **avasopht** » Wed Sep 17, 2014 11:08 am

Edit: looks like I can get this down to N/2 log N real multiplies! See if you can figure it out before I get cracking on it

Code available on pastebin (you will likely want to modify the algorithm to have you wicked way with the output).

Just made a quick demo of this code, generating and detecting a 11hz tone with a 32 point fft. The method ksin/kcos is there as a placeholder to inject a lookup table etc.

I'll document the code later, but basically what happens is that the signal is first transformed to the factorised diff of radix-2 cycles (taking the diff of n and n + N/2).

The transform method (because of the factorisation) only needs to work on a subset of the data. I'll do my best to explain this with diagrams at some point.

Code: Select all

/*  Code is also available at: http://pastebin.com/GV9WiwBu
Copyright (C) 2014 Keldon Alleyne

Keldon.alleyne@gmail.com

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1
 */

/* This file demonstrates a simple implementation of the kfft algorithm.
   A 11hz tone is generates, which is then detected by the algorithm.
*/
#include <stdio.h>
#include <math.h>

#define AVA_PI 3.14159265358979323846

typedef struct Complex {
    double r;
    double i;
} Complex;

void sumdiff(double *signal, size_t size);
double ksin(double rad);
double kcos(double rad);
int lsb(int nn);
Complex transform (double *sig, int k, int sz);
void kfft(double *sig, Complex *X, size_t sz);


void kfft(double *sig, Complex *X, size_t sz) {
    int k;
    sumdiff(sig, sz);
    for(k = 0; k < sz / 2; ++k) {
        X[k] = transform(sig, k, sz);
    }
}

void sumdiff(double *signal, size_t size) {
    const size_t half_size = size / 2;
    double *pleft;
    double *pright;
    size_t i;

    if(size == 1) { return; }
    pleft = signal;
    pright = signal + half_size;

    for(i = 0; i < half_size; ++i) {
        double ldiff = *pleft - *pright;
        double rsum = *pleft + *pright;
        *pleft = ldiff;
        *pright = rsum;
        ++pleft;
        ++pright;
    }

    sumdiff(signal + half_size, half_size);
}

double ksin(double rad) {
    return sin(rad);
}

double kcos(double rad) {
    return cos(rad);
}

int lsb(int nn) {
    unsigned int n = (unsigned int)nn;
    if(n == 0) return 32;
    int result = 0;
    while((n&1) == 0) {
        n >>= 1;
        ++result;
    }
    return result;
}

int klsb(int val, int max_lsb) {
    int lsb_result = lsb(val);
    return lsb_result < max_lsb ? lsb_result : max_lsb;
}

int get_start(int k, int sz) {
    /* frequency k reads the sum of all at the last index. */
    if(k == 0) { return sz - 1; }
    return sz - ( 1 << (lsb(sz) - lsb(k) ) );
}

int get_read_size(int k, int sz) {
    /* frequency 0 has a read size of 1 */
    if(k == 0) { return 1; }
    return 1 << (lsb(sz) - lsb(k) - 1);
}

Complex transform (double *sig, int k, int sz) {
    double *ptr;
    double *end;
    double cwk;
    double wk;
    Complex X;

    wk = (double)k * 2. * AVA_PI / (double)sz;

    ptr = sig + get_start(k,sz);
    end = ptr + get_read_size(k,sz);
    cwk = wk;
    X.r = 0;
    X.i = 0;

    while(ptr != end) {
        X.r += cos(cwk) * (*ptr);
        X.i += sin(cwk) * (*ptr);

        ++ptr;
        cwk += wk;
    }


    return X;
}

int main() {
    const size_t kSize = 32;
    double sig[kSize];
    Complex X[kSize / 2];
    int i;
    double hz = 11.;
    for(i = 0; i < kSize; ++i) {
        sig[i] = ksin(2. * AVA_PI * hz * i / (double)kSize);
    }
    kfft(sig,X,kSize);
    for(i = 0; i < kSize / 2; ++i) {
        printf("%d: %f\n", i, sqrt(X[i].r * X[i].r + X[i].i * X[i].i) / kSize * 2);
    }
    return 0;
}

avasopht · Post by **avasopht** » Mon Sep 22, 2014 11:00 am

Note: Code will be made available by 11 Oct

Features:
1. Selective frequency mode
Only transform selected frequencies.

Uses:
- Process only the portion of your digital filter that you actually need!

Uses:
- Only render needed frequencies (at a significantly reduced cost)

2. Selective sample mode
Only transform selected sample indexes.

Uses:
- Upsampling: simply supply the original waveform, and at a heavily reduced cost calculate the DFT of the upsampled version in the higher sample rate without actually having to insert zeroes
- Simulating zero padding: without actually multiplying the zeroes or needing to hand code the FFT, your "zero padded" DFT can be calculated incredibly quick

Uses:
- Downsampling: Instead of outputting N samples, only calculate the actual samples required!!
- Optimised overlap-add: only output required samples, then add end phase to next IFFT instead of adding the actual samples!

3. Plain old C/C++ code
Compile for different platforms. Some would call this a lack of features (say SSE optimisation), so perhaps I'll provide an autovectorisation friendly version

, but for my purposes I'm on an environment that does not allow processor specific optimisations.

Code will be Apache licensed as there is already FFTW.

This works by bypassing redundant calculations.

Edit: altered description

DaveHoskins · Post by **DaveHoskins** » Mon Sep 22, 2014 12:06 pm

I like the idea of only calculating the frequencies that are needed, it would save a lot of CPU time for some of my algorithms.
That's some great work you're doing, and I look forward to seeing the results.
Dave

duncanparsons · Post by **duncanparsons** » Tue Sep 23, 2014 10:56 am

hmm..

I'm with Dave!

Miles1981 · Post by **Miles1981** » Tue Sep 23, 2014 12:29 pm

Don't forget that you will have to process the n original samples, so what you are changing is actually the ln(n) factor. As it is a log, the cost of computing the whole spectrum is actually small compared to computing part of it. And in some cases, it can be the same because you have to compute everything anyway.

avasopht · Post by **avasopht** » Tue Sep 23, 2014 1:07 pm

Hello,

Yes, regarding the selective frequencies and samples, these are the typical configurations:

INPUT
- Zero padding of the signals (overlap-add FFT), or of frequencies (you usually want one or the other)
- Zero insertions of signal (for upsampling)

OUTPUT
- To selected frequency groups
- To first x samples (partial FFT)
- To every x samples (downsampling)

All that's happening is that parts of the signal flow are being bypassed. It may require code generation to see the actual gains (not sure if FFTW's version does JIT for their version of this functionality). The specification is put through a digraph trying the most efficient route through either DIT or DIF.

So it's not specific signals and frequencies, but groups / intervals (as randomly spaced ins and outs would just result in the entire tree being covered).

---

DaveHoskins wrote:I like the idea of only calculating the frequencies that are needed, it would save a lot of CPU time for some of my algorithms.
That's some great work you're doing, and I look forward to seeing the results.
Dave

Well, this is all a learning curve for me and there was some oversight on how I had divided the task and misread my results, but in general you get some nice savings (like, halving of calculations as you would expect if you are only concerned with half of the input or output). I'm not sure if any free&commercial fft algorithms provide this functionality?? which is why I set out to write mine.

avasopht · Post by **avasopht** » Tue Sep 23, 2014 1:24 pm

Miles1981 wrote:Don't forget that you will have to process the n original samples, so what you are changing is actually the ln(n) factor. As it is a log, the cost of computing the whole spectrum is actually small compared to computing part of it. And in some cases, it can be the same because you have to compute everything anyway.

Yes, and not just that but unless hard-coded the conditionals / virtual calls may incur an execution cost that outweighs the reduction in calculations, which I will need to benchmark.

Shouldn't take too long to write, though right now I'm working on a small web project so it will be a few weeks before I have the time to sit down (though knowing me I'll probably

RFFT (rather Flexible Fourier Transform), ...