KVR Audio

Mayae · Post by **Mayae** » Tue Feb 09, 2021 6:13 pm

Hey guys.

TL;DR
APE is, simply put, an free open source audio plugin with a built-in C++ JIT compiler and a text editor.

Downloads, documentation, manual, repo, more stuff

The old version has been completely rewritten to use a bleeding edge C++ compiler, with a completely refreshed API, UI and many more capabilities.

See flashy videos a bit below if you want to know what this is all about.

Prolog
So I made this little thing a good while back, a "simple" audio plugin with a text editor and a C compiler (TCC) allowing you to write simple DSP code and JIT it into the audio stream...
I was just minding my own business, when the very first comment was about the choice of compiler and optimizations

Flash forward 4 more years, the idea never really left me. The plugin was cool but C is really limiting. Creativity is blocked by the smallest things, and if you're not able to efficiently reuse and eliminate boiler plate, you're not having a good time.

So... C++. LLVM and clang seem to be doing well, even having invested in a JIT framework - but that's a lot of code to pull in, right? Thinking I would save on time and binary size, I even pulled Bjarne Stroustrup's very old CFront transpiler to make a C++ frontend feeding into TCC.
Eventually... It actually worked. But you were stuck with a buggy, 30 year old C++ standard. And of course, still suffering from TCC's (lack of) performance.

I finally broke down and integrated clang and LLVM directly in the binary plugin, giving me a modern C++ library, a configurable linker, parser, optimizer and JIT'er. It only took 3 years of not-getting-much-sleep (after a complete rewrite of the original project anyway, and many new features)!

So, what can this thing do?

Completely self-contained - ships with SDKs, compiler, linker, editor, console and runtime
Bleeding edge Clang/LLVM C++ compiler for class leading diagnostics and code optimization
Full C++17 support (partial C++20 support)
Most of the C++ standard library included (containers, math, numeric, algorithmic etc.)
I/O Audio file streaming with a variety of codecs
System-level exception handling to avoid crashing your host on small errors like integral division
Console with logging
Themeable editor with syntax highlighting
Oscilloscope with expression evaluation
Precise, smoothed automatable parameters
Built-in optimized FFTs
26 included effect scripts that are documented (54 legacy scripts for inspiration, as well)
Cross platform support (AU/VST Windows, OS X, Linux for anyone who wants to compile it)

Videos
I made a short, quick rundown of the features of this plugin:

I also made a longer annotated video of me developing a morph filter from scratch in this plugin:

Changelog - new in v. 0.5.0

Code: Select all

• Version controlled

• New repository: https://bitbucket.org/Mayae/ape
	o New source structure with permanent platform projects
	o Most dependencies now included as submodules
	o Modular and testable components
	o Unit tests

• Build system
• Headers shared between plugin and user scripts, removes stale errors
• Complete rewrite to modern C++
• All manual memory management and leaks removed
• All UI, utility etc. now uses cpl
• Extended platform support, including technical Linux support. See requirements.

• Parameters
	o Much more precise user controls, with ability to type in precise 64 bit values
	o More flexible and extensible format / range options for parameter values
	o Enumeration / lists of strings now supported as combo boxes
	o Now automatable by host
	o 64-bit precision internally

• Widgets
	o Meters are now per-sample evaluated and properly decaying. Also contains peak markers.

• Iteration
	o Compatible parameter values preserved
	o Hotkeys for all major operations
	o True multithreaded compilation across plugins
	o Old/new sound blended on swapping instances

• Engine
	o Optimized and built-in FFT
	o Support for streaming audio files to and from disk, optionally resampled
	o Audio thread interactions now completely lock free
	o Precise transport access and playback state events to the plugin 

• Quality of life
	o SDKs and libraries now ship included, removed reliance on user development setup
	o Plugin callbacks for initialization and reconfigurations now run asynchronously to avoid stutter on audio thread and hiccups on main thread
	o Many more checks of resource managements, assertions etc. to make it much more safe
	o Working code is serialized into the project as well, instead of referencing a script on disk.
	o User is notified mismatched / out-of-date scripts
	o Removed nonsensical errors on abandoned save dialogs
	o Long operations timed and printed to the console

• Plugin GUI
	o Resizable
	o Redesigned, bit buttons removed in favor of simple icons and hotkeys
	o Now completely uses vector graphics instead of bitmaps
	o Switched to a tabbed system to increase real estate
	o Tabs can be orphaned into separate desktop windows, and redocked back
	o Graphics optimized and employs precise redrawing, much faster on OS X using core graphics ren-derer
	o Subpixel text rasterization for normal DPI displays
	o Removed "fpu exceptions" and "protected buffers" switches. These are now determined by compila-tion mode.

• Source code editor
	o Externally editing files is now supported, reloading and recompiling whenever the file is saved exter-nally
	o Full project and intellisense when working in the source repository for user scripts
	o Evaluate source code expressions as "breakpoints"
	o Text scaling
	o Auto indentation
	o Saving a file without extension and determined language appends the default language extension
	o Menu option to open "home" (also configurable) scripting directory
	o Menu option to create a new file, cloned from the template file
	o Menu options for build events (compile, activate, clean, edit externally etc.)
	o Default now with a dark theme

• New compiler / language: CppApe
	o C++ 17 bleeding edge compiler, based on Clang
	o Runtime vehicle is libCppJit: Multithreaded, lazy JIT based on LLVM
	o User scripts can now include and use other scripts
	o Completely revised front end together with safe and idiomatic user API, boiler plate removed
	o Access to most of C++ standard library, based on libcxx
	o Subset of C standard library available, based on ccore
	o Built-in SIMD vectorized math
	o DSP primitives, interpolation algorithms
	o Type safe and much faster print() family functions
	o 32-bit / 64-bit / 80-bit templated math precision, switchable by user in UI
	o Typical scientific math constants available as templated constant expressions
	o Complete user API documentation here: TODO
	o Assertions supported
	o RAII and unwind support
	o Some exception support
	o Globals, static constructors and destructors supported
	o Memory mapped and precompiled system headers for compilation speed

• Oscilloscope
	o Based on Signalizer
	o Per-sample source code expression evaluation and graphing
	o Color coded inputs / outputs
	o User-defined triggering

• Bugs
	o Console is now thread safe
	o Compilation is now thread safe
	o Fixed crashes on immediate deserialization
	o Many user file bugs fixed

• Tcc4APE
	o More or less deprecated, still ships in source form but complete support is missing
	o Same for syswrap.

Check it out, hopefully it can be useful or provide some fun!
- Janus

lalo · Post by **lalo** » Tue Feb 09, 2021 7:30 pm

That's wonderful news! Thanks

syntonica · Post by **syntonica** » Tue Feb 09, 2021 7:45 pm

Only at .5 release? I'll wait for the 1.0...

Looks very interesting--I'll have to take this new version for a spin. Thanks for all your hard work!

mystran · Post by **mystran** » Wed Feb 10, 2021 1:22 am

Project looks really cool, but when browsing some of the library I noticed a somewhat misleading comment for the partitioned convolution: the number of multiplications is not really the same, it's a bit more complicated.

Suppose that we have an input block-size of N and we want to convolve this with a signal of M*N samples. We need to zero-pad the FFT to 2*N and then we perform 2*M*N madds for the partitioned convolution. However, if we used an FFT blocksize of N+M*N (which is still long enough) then we would only need to perform N+M*N madds, at the cost of having to process a longer FFT. Partitioned still wins, because in practice FFT grows O(NlogN) so the extra cost is more than M*N-N, but it's not quite accurate to say that the number of madds is the same.

On the other hand, if you're willing to accumulate a longer input block, then you get much larger savings, so typically it only really makes sense to do something like 16-32 (give or take) partitions before it's more efficient to run a second partitioning scheme with a larger blocksize in parallel (with the shorter one used to hide the latency of the second one) and so on for larger and larger partitions if you want really long IRs. Obviously this does get a lot more complicated though.

Xenakios · Post by **Xenakios** » Wed Feb 10, 2021 8:54 pm

Amazing work!

One thing I couldn't figure out with a quick look around : does this support more I/O channels than stereo? If yes, how can that be set up?

mystran · Post by **mystran** » Wed Feb 10, 2021 9:16 pm

I had to look at what FFT you're using and I'm kinda flattered by what I found. Cheers.

edit: Back when I wrote DustFFT some compilers were giving better performance (not by much, but a couple of percent or so) if compiled as C rather than C++ so I wonder, did you sanity check that this isn't the case with recent clang?

Mayae · Post by **Mayae** » Wed Feb 10, 2021 10:51 pm

lalo wrote: Tue Feb 09, 2021 7:30 pm That's wonderful news! Thanks

Hope it'll be useful!

syntonica wrote: Tue Feb 09, 2021 7:45 pm Only at .5 release? I'll wait for the 1.0...

Looks very interesting--I'll have to take this new version for a spin. Thanks for all your hard work!

Hah, I seem to have hit a special case of relativity. Let me know if it works out.

mystran wrote: Wed Feb 10, 2021 1:22 am... I noticed a somewhat misleading comment for the partitioned convolution: the number of multiplications is not really the same, it's a bit more complicated.

... yes, sorry for the late response - I wanted to take a look firstly. You are right, that was simplified at best. I've noted to address this!

mystran wrote: Wed Feb 10, 2021 1:22 amOn the other hand, if you're willing to accumulate a longer input block, then you get much larger savings, so typically it only really makes sense to do something like 16-32 (give or take) partitions before it's more efficient to run a second partitioning scheme with a larger blocksize in parallel (with the shorter one used to hide the latency of the second one) and so on for larger and larger partitions if you want really long IRs. Obviously this does get a lot more complicated though.

Yes, I wanted to do a sample with variable-length partitions as well! IIRC a colleague warned me of a possible patent issue though, so I never got around to it again.

mystran wrote: Wed Feb 10, 2021 9:16 pm I had to look at what FFT you're using and I'm kinda flattered by what I found. Cheers.

edit: Back when I wrote DustFFT some compilers were giving better performance (not by much, but a couple of percent or so) if compiled as C rather than C++ so I wonder, did you sanity check that this isn't the case with recent clang?

It's getting plenty of use here and there, you're also driving all of Signalizer's magic! I did pit it against other reasonable options, and even with conversion from float -> double it was leading with a comfortable margin. I'm rarely in a position where the FFT itself is the bottleneck though.

I did try compiling as C back in the day but didn't find enough of a difference to care. Last time I checked codegen was identical, after applying a few C++ fixes I think I pinged you about.

Xenakios wrote: Wed Feb 10, 2021 8:54 pm Amazing work!

One thing I couldn't figure out with a quick look around : does this support more I/O channels than stereo? If yes, how can that be set up?

The engine totally does but looking at the configuration file I can see I don't even try to report anything but mono/stereo caps.
The last time someone needed this I ended up compiling a particular surround version, IIRC at least in my version of JUCE you could only really get one configuration working per binary

Updating JUCE is on the list though so it might come for free. I'll take a look regardless.

Xenakios · Post by **Xenakios** » Thu Feb 11, 2021 3:49 pm

Mayae wrote: Wed Feb 10, 2021 10:51 pm The engine totally does but looking at the configuration file I can see I don't even try to report anything but mono/stereo caps.
The last time someone needed this I ended up compiling a particular surround version, IIRC at least in my version of JUCE you could only really get one configuration working per binary Updating JUCE is on the list though so it might come for free. I'll take a look regardless.

If you mean the Juce legacy Plugin Channel Configurations string, using something like this has worked for my projects :

Code: Select all

{2,2},{2,4}, {2,8}, {8,8}

Of course that's not completely generic and someone will eventually want something crazy like {64,64} etc. Also I am not sure if this really works properly for stereo-only hosts and so on...It's messy.

(Still, I suppose you would want to support at least 4 inputs, to allow for sidechaining?)

mystran · Post by **mystran** » Thu Feb 11, 2021 5:57 pm

Code: Select all

			builder.args()
				//.arg("fno-short-wchar")
				.arg("-fms-extensions")
				.arg("-O2")
				//.arg("--stdlib=libc++")
				.arg("-D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS")
				.arg("-fexceptions")
				.arg("-fcxx-exceptions")
#ifdef CPL_MAC
				.arg("-fno-use-cxa-atexit")
#endif
				.argPair("-D_LIBCPP_DEBUG=", "0", cpl::Args::NoSpace)
				.argPair("-D__CPPAPE_PRECISION__=", std::to_string(getProject()->floatPrecision), cpl::Args::NoSpace)
				.argPair("-D__CPPAPE_NATIVE_VECTOR_BIT_WIDTH__=", std::to_string(getProject()->nativeVectorBitWidth), cpl::Args::NoSpace)
				.argPair("-D__STDC_VERSION__=", "199901L", cpl::Args::NoSpace)
				.argPair("-std=", "c++17", cpl::Args::NoSpace)
				.argPair("-include-pch", (dirRoot / "runtime" / "common.h.pch").string());

I would suggest making this at least somewhat configurable. In particular, optional -Ofast (which includes fast-math, so not always safe, but you want it like 99.9% of the time anyway) and more warnings/errors (eg. "-Wall -Werror -Wfloat-conversion") can save a lot of debugging time.

DaveClark · Post by **DaveClark** » Thu Feb 11, 2021 6:16 pm

Mayae wrote: Wed Feb 10, 2021 10:51 pm Yes, I wanted to do a sample with variable-length partitions as well! IIRC a colleague warned me of a possible patent issue though, so I never got around to it again.

Hi Mayae,

This is the infamous Lake-Gardner patent which has run out. You, a bunch of other people, probably mystran, and I re-invented this idea, proving that it should never have been patented in the first place (being obvious to those practiced in the art). It was eventually invalidated in Europe, or so I heard. Anyway, you're free to use that approach now, speaking as a "non-attorney spokesperson."

Very nice work you've done there. It seems a bit too complicated for novice DSP students, and not quite what I would use myself as a developer, but I could see those music producers who are also experienced programmers using it quite a bit.

Regards,
Dave Clark

Mayae · Post by **Mayae** » Thu Feb 11, 2021 11:41 pm

Xenakios wrote: Thu Feb 11, 2021 3:49 pmIf you mean the Juce legacy Plugin Channel Configurations string, using something like this has worked for my projects :
Code: Select all
{2,2},{2,4}, {2,8}, {8,8} 
Of course that's not completely generic and someone will eventually want something crazy like {64,64} etc. Also I am not sure if this really works properly for stereo-only hosts and so on...It's messy. (Still, I suppose you would want to support at least 4 inputs, to allow for sidechaining?)

It seems with a bit of work I can make it a setting in the configuration file (here's hoping it doesn't have to be baked in a weird AU resource file)

mystran wrote: Thu Feb 11, 2021 5:57 pm I would suggest making this at least somewhat configurable. In particular, optional -Ofast (which includes fast-math, so not always safe, but you want it like 99.9% of the time anyway) and more warnings/errors (eg. "-Wall -Werror -Wfloat-conversion") can save a lot of debugging time.

I actually build in provisions for this:
https://bitbucket.org/Mayae/ape/src/f00 ... g#lines-91

Seems like I currently just ignore it. This will be fixed in next version.

DaveClark wrote: Thu Feb 11, 2021 6:16 pm Hi Mayae,

This is the infamous Lake-Gardner patent which has run out. You, a bunch of other people, probably mystran, and I re-invented this idea, proving that it should never have been patented in the first place (being obvious to those practiced in the art). It was eventually invalidated in Europe, or so I heard. Anyway, you're free to use that approach now, speaking as a "non-attorney spokesperson."

Great news, thanks for shedding some light on that!

DaveClark wrote: Thu Feb 11, 2021 6:16 pm Very nice work you've done there. It seems a bit too complicated for novice DSP students, and not quite what I would use myself as a developer, but I could see those music producers who are also experienced programmers using it quite a bit.

Regards,
Dave Clark

Thanks. Tough comment as in many ways the first two categories were originally what I wanted to cater to, but I appreciate the honesty!

Jeff McClintock · Post by **Jeff McClintock** » Fri Feb 12, 2021 4:27 am

congrats! super cool idea.

camsr · Post by **camsr** » Fri Feb 12, 2021 8:17 am

mystran wrote: Thu Feb 11, 2021 5:57 pm
Code: Select all
			builder.args()
				//.arg("fno-short-wchar")
				.arg("-fms-extensions")
				.arg("-O2")
				//.arg("--stdlib=libc++")
				.arg("-D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS")
				.arg("-fexceptions")
				.arg("-fcxx-exceptions")
#ifdef CPL_MAC
				.arg("-fno-use-cxa-atexit")
#endif
				.argPair("-D_LIBCPP_DEBUG=", "0", cpl::Args::NoSpace)
				.argPair("-D__CPPAPE_PRECISION__=", std::to_string(getProject()->floatPrecision), cpl::Args::NoSpace)
				.argPair("-D__CPPAPE_NATIVE_VECTOR_BIT_WIDTH__=", std::to_string(getProject()->nativeVectorBitWidth), cpl::Args::NoSpace)
				.argPair("-D__STDC_VERSION__=", "199901L", cpl::Args::NoSpace)
				.argPair("-std=", "c++17", cpl::Args::NoSpace)
				.argPair("-include-pch", (dirRoot / "runtime" / "common.h.pch").string());
I would suggest making this at least somewhat configurable. In particular, optional -Ofast (which includes fast-math, so not always safe, but you want it like 99.9% of the time anyway) and more warnings/errors (eg. "-Wall -Werror -Wfloat-conversion") can save a lot of debugging time.

The compiler supports function-scope optimization attributes, so that may be useful. I would not compile an entire program using -Ofast just to be safe...

Great to see an update to this timesaver!

mystran · Post by **mystran** » Fri Feb 12, 2021 11:58 am

camsr wrote: Fri Feb 12, 2021 8:17 am The compiler supports function-scope optimization attributes, so that may be useful. I would not compile an entire program using -Ofast just to be safe...

I've been compiling basically everything with -ffast-math or equivalent since the 90s, it's even the default in ICC with optimized builds I think. During that time I've seen maybe 2-3 issues related to this and all of them were easy to fix (eg. use isnan() instead of x!=x to check for NaNs, stuff like that).

mystran · Post by **mystran** » Fri Feb 12, 2021 1:14 pm

DaveClark wrote: Thu Feb 11, 2021 6:16 pm Very nice work you've done there. It seems a bit too complicated for novice DSP students, and not quite what I would use myself as a developer, but I could see those music producers who are also experienced programmers using it quite a bit.

Right, this is a tricky thing.

My current framework can already do most of the things (except hot-reload, but I've been thinking about that) that APE does with roughly a similar amount of effort. Plotting signals is a bit more work, but we're talking about 2-3 lines per signal (+5 to add the scope to the editor). Text-editor I'd rather keep outside the DAW and I don't mind having to press ctrl-B to build. My collection of DSP utilities is pretty decent, it's all just an #include away and most importantly I already know how to use them (no manuals needed; this is really the big reason I'm wondering whether I should bother trying to use APE for simple prototyping and/or quick utilities). Not all of it might be as robust (it's really just what I need), but still.

On the other hand, getting to this point took years and I could certainly see APE being a much bigger productivity boost for someone who hasn't spent as much time on improving their workflow and while I haven't spent that much time looking at it, I'm not entirely sure if I agree with the "too complicated" part; the very first steps of DSP plugin development can be a little rough no matter what and I don't think APE looks particularly bad in that sense (rather the opposite).

The big elephant in the room though is stability though: especially people who've recently learned programming are going to make a bunch of memory errors, they are going to trash the DAW's memory, they are going to trash the stack, they are going to return invalid data, you get the idea. So I feel like trying to handle exceptions in the DAW might not be quite enough, so I wonder whether a bridging approach (ie. run the editor and all in DAW process, but offload the actual plugin into a sandbox) might ultimately be a better choice.

Audio Programming Environment 0.5.0: C++ DSP directly in your DAW!