KVR Audio

mystran · Post by **mystran** » Mon Jul 07, 2014 6:00 pm

Not sure if anyone cares, but .. right now this appears to work .. so I'll upload it before the next optimization breaks everything again. In any case, I'll take back whatever I said about SSE2 in the "efficient code" thread.

https://github.com/signaldust/DustFFT

[edit 2012: updated link, also comes with wrappers for real-to-half-complex and back]

This is a "single C file" library, though there is a header embedded into the file that you can copy-past into a separate .h file if you feel like it. Currently does complex double precision (and power of 2) transforms only. It will currently only compile on x86 (or x64) because for whatever reason I didn't #ifdef the SSE2 intrinsics (yet).

See the source, but basically there's simple _fwdD / _revD functions (the 'D' is for doubles, in case I decide to add single precision) to do forward/reverse transforms, which take a buffer (double*) and the size of the transform and the buffer should contain twice that many numbers, with real/imag inputs interleaved. The following should be safe with any half-decent std::complex implementation:

Code: Select all

 std::complex<double> buf[1024]; /* fill this with whatever */
 DustFFT_fwdD(reinterpret_cast<double*>(buf), 1024);

For performance, align the buffer at 16 bytes. The _fwdD and _revD functions will otherwise pick a slower scalar path whenever you happen to pass misaligned stuff. You can skip the alignment check with another set of entry points (_fwdDa and _revDa) in case you'd rather have it crash instead of taking the slower scalar path.

Neither forward or reverse transforms do any scaling, so as usual, you probably want to divide everything by N at some point (typically after reverse .. but it varies, hence not done automatically).

Some day I might add other stuff like single precision, AVX or maybe real-only transforms (let me know if you feel like sponsoring something; personally I just need complex doubles most of the time), or .. I don't know.. but for now it does complex double precision. The source code looks like horrid garbage, but that's mostly because (1) getting decent performance out of an FFT means duplicating the same code a dozen times with just some slight variations, (2) I got tired of trying to convince C++ compilers to do the right thing with templates and (3) I didn't feel like wasting time writing a pretty-printer into the generation script (which is written in a variant of Lisp and not provided). So yeah, it's not the actual source .. just something you can compile.

Anyway, it appears to work. It also appears to work reasonably fast. So how fast is it? Well, it probably won't compete with FFTW or IMKL, but the scalar code-path is typically slightly slower than Ooura split-radix (both compiled on MSVC08/fp:fast + SSE2 instructions; for x87 it'll probably be much slower), while the SSE2 code-path should be faster (except for some tiny transforms.. but if your bottleneck is 8 point FFTs then .. well, I could probably special case that). How much faster? Well, that depends on the system; on my sandy-i7 desktop, for most larger transforms it takes about 0.75 times what Ooura takes (as long as we fit into L3; at 2^10 points that's no longer the case, and I should probably add another strategy there) and on my netbook where the difference is smaller, I actually beat both with a C++ template mess that I wrote last week, so there's a pretty good chance I could make it even faster.. but even there it's a bit faster.

Obviously speed is not the only reason I wrote this. I also needed the ability to easily add extra features, but I'm providing the "bare-bones" version for the general public in case you might find it useful.

In terms of accuracy it won't break any records, but it's good enough that for audio purposes you really shouldn't have to worry about that. If you observe the scalar version losing precision (when compiled with -ffast-mat or something), please let me know and I'll see if I can try to convince your compiler to get it right.

Mayae · Post by **Mayae** » Wed Jul 09, 2014 11:13 am

Nice i'll check it out. What's the license (can't dl link here)?

mystran · Post by **mystran** » Wed Jul 09, 2014 11:39 am

Mayae wrote:Nice i'll check it out. What's the license (can't dl link here)?

Code: Select all

/****************************************************************************\
* DustFFT - (c) Copyright Teemu Voipio 2014                                  *
*----------------------------------------------------------------------------*
* You can use and/or redistribute this for whatever purpose, free of charge, *
* provided that the above copyright notice and this permission notice appear *
* in all copies of the software or it's associated documentation.            *
*                                                                            *
* THIS SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTY. USE AT YOUR OWN    *
* RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE HELD LIABLE FOR ANYTHING.  *
*----------------------------------------------------------------------------*
*                                                                            *
* This file is generated by a script, direct editing is a terrible idea.     *
\****************************************************************************/

I could probably relax that to only require a copyright notice for binary distribution, but in any case it's not terribly restrictive.

xoxos · Post by **xoxos** » Thu Jul 10, 2014 9:17 pm

Mayae · Post by **Mayae** » Mon Jul 14, 2014 6:53 pm

I've used it some and it's working - needs some more testing though.

However, this is required to compile it on LLVM on osx:

Code: Select all

		static const unsigned ALIGN16 _mask_pd_sign_lo[4] = { 0, 1 << 31, 0, 0 };

to this:

Code: Select all

		static const unsigned ALIGN16 _mask_pd_sign_lo[4] = { 0, 1U << 31, 0, 0 };

mystran · Post by **mystran** » Mon Jul 14, 2014 8:43 pm

Mayae wrote:I've used it some and it's working - needs some more testing though.

However, this is required to compile it on LLVM on osx:
Code: Select all
		static const unsigned ALIGN16 _mask_pd_sign_lo[4] = { 0, 1 << 31, 0, 0 }; 
to this:
Code: Select all
		static const unsigned ALIGN16 _mask_pd_sign_lo[4] = { 0, 1U << 31, 0, 0 }; 

Really? I checked it with both GCC and Clang on Linux and didn't get any warnings with -Wall, but sure I can fix that.

edit: to be honest though, I wonder if that could be made cleaner in general.. right now it relies on "unsigned" being 32-bit type, should probably make it uint32_t and then typedef for older MSVC explicitly..

Mayae · Post by **Mayae** » Tue Jul 15, 2014 12:11 pm

It's not even a warning, it's an error stopping compilation. Was kinda surprised, too. The error is something along the lines of "error: numerical constant expands to -2147483648 which cannot be narrowed to target type". I'll check it later..

It appears the expression is illegal? Found this at google: http://reviews.llvm.org/D2309

But in general, you might as well shift unsigned target types.. Why not, anyway.

Miles1981 · Post by **Miles1981** » Tue Jul 15, 2014 1:08 pm

It must be that "unsigned" refers to something else.
If you need the int32_t type, you should/can use the Boost implementation.

Mayae · Post by **Mayae** » Tue Jul 15, 2014 3:52 pm

Here was the error.

Code: Select all

error: constant expression evaluates to -2147483648 which cannot be narrowed to type 'unsigned int' [-Wc++11-narrowing]
                static const unsigned ALIGN16 _mask_pd_sign_lo[4] = { 0, 1 << 31, 0, 0 };
                                                                         ^~~~~~~
note: override this message by inserting an explicit cast
                static const unsigned ALIGN16 _mask_pd_sign_lo[4] = { 0, 1 << 31, 0, 0 };
                                                                         ^~~~~~~
                                                                         static_cast<unsigned int>( )

Miles1981 · Post by **Miles1981** » Tue Jul 15, 2014 4:11 pm

Indeed, it cannot be described as a signed int32_t, hence the error. With 1U, you are still inside the unisgned int range.

mystran · Post by **mystran** » Wed Jul 16, 2014 4:29 am

Mayae wrote:Here was the error.

Code: Select all

error: constant expression evaluates to -2147483648 which cannot be narrowed to type 'unsigned int' [-Wc++11-narrowing][/quote]

Oh right, that explains the error: are you maybe trying to compile it as C++? I admit that I only tried gcc and clang in plain C mode, because .. well the code is C. :)

There's no good reason why that shouldn't work, but it's really just C code, with just the header part written to be C++ compatible by #ifdef guarding the extern "C" definitions (so you can #include from C++ files and expect things to work).

Miles1981 · Post by **Miles1981** » Wed Jul 16, 2014 7:53 am

Still, this C++ error should also be an error in C, as the types are the same.

mystran · Post by **mystran** » Wed Jul 16, 2014 8:17 am

Miles1981 wrote:Still, this C++ error should also be an error in C, as the types are the same.

Actually no. In C it's just undefined behavior.

edit: also the whole reason for a silly integer bitmasking thing is that if I wrote it as doubles with {-0.0,0.0} then the silly compilers will normalize the negative zero into a positive one with /fp:fast or -ffast-math set..

Miles1981 · Post by **Miles1981** » Wed Jul 16, 2014 8:44 am

Yes indeed. But you shouldn't rely on undefined behavior anyway :p

mystran · Post by **mystran** » Wed Jul 16, 2014 11:25 am

Miles1981 wrote:Yes indeed. But you shouldn't rely on undefined behavior anyway :p

Well, it's fixed in my generator script already.

But really I would like to argue that there is quite a bit of useful stuff that is technically "undefined behavior" but in practice perfectly safe to rely on (and in some cases even defined as legit extensions by plenty of compilers), as long as you understand the potential implications on portability. Obviously you should avoid such things when doing so is reasonable, but usually the moment you start doing anything low-level (like storing bit-patterns and then loading them as doubles or __m128) you are pretty firmly in the realm of undefined, yet this is pretty much the only way you can actually get things done.. and if I'm not mistaken, even the alignment check is technically relying on undefined behavior as C99 allows you to cast a pointer into an uintptr_t and back, but you aren't really supposed to make assumptions about the actual values.

So .. yeah .. given the choice of writing it all in assembler, or stepping a bit in the realm of undefined with C every once in a while.. I'll generally choose the latter.

DustFFT - in case you need an FFT :P