KVR Audio

stefano-orastron · Post by **stefano-orastron** » Mon May 30, 2022 9:01 am

Z1202 wrote: Mon May 30, 2022 8:59 am
stefano-orastron wrote: Mon May 30, 2022 8:52 am I tend to disagree. Accessing "past" and "future" samples can be done with regular delay lines, possibly implemented by circular buffers. On the optimality of the thing... well, that's another chapter, but in most cases I bet a simple approach wouldn't be horrible.
I'm personally not sure which of the approaches is better, and one could definitely want to do both. Also generating the entire BLEP transient at once is definitely simpler to think of (YMMV), even though it can be rewritten in purely causal terms. Thus, it may affect the usability of the language.

That is 100% true and needs to be taken into account.

Z1202 · Post by **Z1202** » Mon May 30, 2022 9:20 am

stefano-orastron wrote: Mon May 30, 2022 9:00 am Last time I checked you couldn't export C/C++ code or make your own VSTs, let alone having a decent CLI or dealing with all the legal issues that could be attached to such a thing. Of course it's different if you're NI. But I admit I am ignorant and may have overlooked something.

No, that's correct, and that's the product decisions I was referring to. Technically tho, it's pretty straightforward.

stefano-orastron wrote: Mon May 30, 2022 9:00 am Unluckily many details will only be available once the first paper is out (in a few days now), but essentially what is "solved" is that you can express whatever algorithm and grouping parts of it in "blocks" in arbitrary ways, and that has no impact on how operations are scheduled. AFAIK, no current language can do that.

Hmmm, let's wait for your paper to see the details, but from this surface description, IIUC, so far it sounds pretty much like what ReaktorCore is doing as well.

mystran · Post by **mystran** » Mon May 30, 2022 12:26 pm

kippertoffee wrote: Mon May 30, 2022 8:03 am
mystran wrote: Sun May 29, 2022 7:36 am I've thought about this problem some and I feel like these kinds of things would probably be best expressed by moving to a higher level, treating signals-over-time as primitive types so that something like a gather-FIR application could be directly expressed as a primitive inner-product between maybe an abstract slice of the signal and similarly a scatter-FIR could be treated as a sum of two signals and such operations would then somehow be translated by the compiler into efficient SIMD operations on buffers rather than naive sample-by-sample dataflow graphs...
Can you elaborate on this scatter/gather terminology? It's not something I've come across before.

By "gather" I mean the traditional approach of taking an inner product between a shifted FIR kernel and the input signal to produce one output sample and "scatter" is then the transposed version of the same thing where you take the FIR kernel and scale it by one input sample and mix it suitably shifted into an output buffer. I'm not sure if these terms are in wide use, but I'm not aware of any better terminology either.

Assuming the input and output sampling rates are the same, this two are entirely equivalent and the "gather" approach is typically more efficient. If the sampling rates are different, then in the "gather" case the kernel frequency response is relative to the input rate and in the "scatter" case it's relative to the output rate. BLEP synthesis then can be seen as a special case of "scatter" filtering where we basically assume that the kernel is the identity operator except at discontinuities.

If we think of in terms of image processing, then "gather" would be your ordinary blur where you run a single shader per pixel gathering samples around the target pixel and "scatter" would be a blur where you draw a separate scaled sprite for every source pixel with additive blending.

mystran · Post by **mystran** » Mon May 30, 2022 12:45 pm

Z1202 wrote: Mon May 30, 2022 8:59 am
stefano-orastron wrote: Mon May 30, 2022 8:52 am I tend to disagree. Accessing "past" and "future" samples can be done with regular delay lines, possibly implemented by circular buffers. On the optimality of the thing... well, that's another chapter, but in most cases I bet a simple approach wouldn't be horrible.
I'm personally not sure which of the approaches is better, and one could definitely want to do both. Also generating the entire BLEP transient at once is definitely simpler to think of (YMMV), even though it can be rewritten in purely causal terms. Thus, it may affect the usability of the language.

I find that generally speaking I get the best average performance (with moderately long BLEPs on the order of 32 samples, perhaps a bit more at times) by using a buffer that's long enough to hold the current processing block and one extra kernel length as this means that the inner loop that does the mixout of the BLEPs doesn't need any fancy logic, it'll just read+interpolate+scale the relevant branches and mixes at an offset (using SIMD). This approach has the downside that you need to shift and clear the buffer after every block, so it's not great for very short blocks (though you can amortise in case of buffer splits by checking whether we still have enough space without shifting), but overall it's worked well for me. Ring-buffer is another possibility, but then you'll need some extra logic to mix the BLEP in two parts when it crosses the wrap-around point.

I see two problems with the approach that tries to avoid mixing the whole transients at once. The first one is that you'll potentially have to track a lot of BLEPs at the same time if the kernel is long, frequency is high and the waveform requires several short segments each of which might require anti-aliasing in several derivatives. If you mix the entire transient when you encounter it, the worst that could happen is that CPU usage will increase, yet the other issue is that it's relatively easy to optimize the whole BLEP mix-out using SIMD where as it's much harder to do that if you mix them out gradually... so I've always felt like mixing them out completely as encountered is a win-win situation with more maintainable code that performs better. YMMV.

sletz · Post by **sletz** » Mon May 30, 2022 6:15 pm

SMC 2022 "Ciaramella: A Synchronous Data Flow Programming Language For Audio DSP" paper is there: https://zenodo.org/record/6573430#.YpUJlC8itpQ

mystran · Post by **mystran** » Tue May 31, 2022 9:57 am

stefano-orastron wrote: Mon May 30, 2022 9:00 am This means that you can express very clearly and concisely things like wave digital filters (even if I do agree with mystral that in practice they're not as good as they look on paper most of the time), or more in general you can abstract things away parts of algorithms as freely as possible.

I want to clarify what I said earlier as it was a bit opinionated as I was a bit moody. From the academic point of view, I actually find WDFs quite interesting and elegant. I actually spent a fair amount of time with them, reading most of the original batch of papers (from the 1970s) sometime in the late 00s which I guess was slightly before or around the time there was a new surge of papers about using them for audio purposes.

Yet... from a purely engineering point of view, I find that it's generally just easier to use standard linear algebra with the circuit laws instead. While an MNA-style stamping process does make this process a little more structured, in a sense it's still a whole lot less elegant than a composition construct like WDFs, yet because there is little structural constraints you can freely mix and match whatever you can express as linear algebra and it'll mostly just work.

Now, the last point is also important. From an academic point of view it's typically valuable to be able to say that "when you use these building blocks, it'll always just work" where as from an engineering point it's often enough to be able to say "we're convinced this thing works in the special case we're trying to construct here." So even if you might not be able to prove that your LU pivoting is perfect (whether in terms of numerical stability or in terms of fill-in that affects performance) or that your Newton convergence is absolutely robust, as long as you can reach high confidence that your specific special case works fine, that's typically good enough.

In a sense it's kinda like the difference between a beautiful referentially transparent function language vs. an entirely unsafe low-level imperative language. The former allows you to better reason about your code, but if you need to implement an efficient hash-table (which in itself is somewhat of an "usually works well" datastructure) then low-level imperative language is probably going to work better for you.

Going back to the design of a declarative programming language for signal processing, this leads to a dilemma. On one hand, you'd prefer to have a language design that presents well-founded constructs that always lead to working code. On the other hand, you would also prefer to have a language that is complete enough not to unnecessarily restrict what problems can be solved. In fact I feel like this is a general conflict in PL design and I'm not entirely convinced there even exists a global optimum. If we think of general purpose languages, C++, Haskell, Lisp, Lua .. even Python, they all approach different local optimums, but which one is closed to the global optimum usually depends on what you're trying to do exactly.

I didn't read that paper yet, I'll read it once I'm done with this post, but figured I'd just clarify that I do indeed fully appreciate the difficulty of PL design problems (even ignoring whether or not one can be practically implemented).

kippertoffee · Post by **kippertoffee** » Tue May 31, 2022 11:27 am

mystran wrote: Mon May 30, 2022 12:26 pm By "gather" I mean the traditional approach of taking an inner product between a shifted FIR kernel and the input signal to produce one output sample and "scatter" is then the transposed version of the same thing where you take the FIR kernel and scale it by one input sample and mix it suitably shifted into an output buffer. I'm not sure if these terms are in wide use, but I'm not aware of any better terminology either.

Thanks!

Archit3ch · Post by **Archit3ch** » Tue May 31, 2022 3:03 pm

I contacted you about this privately, but I'm reposting here for visibility. Are you aware of Swanky Amp: https://github.com/resonantdsp/swankyamp ? It seems to be using FAUST for its DSP and exporting to C++ for realtime. Does that application hit the limitations described in your blog post?

I agree with the others that community/tooling/documentation matter for programming language adoption.

mystran · Post by **mystran** » Tue May 31, 2022 3:39 pm

From reading the paper, it appears that the main innovation here is the idea that on high-level one need not worry about evaluation order and implementability as long as there are sufficient delays at some lower level that when the whole thing is inlined and unrolled we can then find an implementable evaluation order?

I'm not familiar with FAUST as such, but according to the paper in FAUST you would need to have an implementable evaluation order at the higher-level of blocks, where as here we don't treat the building blocks as atomic units that are evaluated in order, but rather schedule the whole thing globally.

Based on some screenshots of ZDF filters in Reactor Core (never actually tried to the thing myself) I thought this was something that RC would do as well? Perhaps Vadim can comment on this?

Anyway I like the idea and I don't think there's an explosion problem with branches in this sort of scheme. The key is to realize that an expression breaks a delay-free loop if it can produce it's output before it consumes it's input and a branch-construct breaks a delay-loop if and only if both of the branches break a delay-free loop. Hence you can recursively color every expression in the graph without having to enumerate all the possibilities.

ps. Actually upon further thought, if you work in SSA-like form I think you can color every expression in an arbitrary CFG simply by following the rule that we can schedule an expression early if it only depends on expressions that can be scheduled early and a phi can be scheduled early if none of it's source values needs to be scheduled late (ie. they can all be scheduled early or their status depends on the status of the phi). Then it's pretty much just a standard iterative SSA dataflow analysis. I didn't prove that this is so, but it .. seems plausible. Then rather than scheduling invidual expression, you'd split every basic block into early and late versions depending on how each expression needs to be scheduled.. or ... something like that anyway.

Z1202 · Post by **Z1202** » Tue May 31, 2022 6:08 pm

mystran wrote: Tue May 31, 2022 3:39 pm Based on some screenshots of ZDF filters in Reactor Core (never actually tried to the thing myself) I thought this was something that RC would do as well? Perhaps Vadim can comment on this?

Yes, there is no fixed imperative-like ordering on any level in ReaktorCore (or in its precursor SynC Modular, for that matter

). Unless you explicitly impose the order in some cases. The graph more or less just specifies the values dependencies. The actual evaluation order is defined globally by the compiler down to the lowest fundamental elements (more or less ignoring the block boundaries), and this is a fundamental feature of the language, potentially allowing very high reusability of building blocks in different contexts (of which, at the present time, in ReaktorCore we have only the update conditions, such as various signal rates).

I'm not sure if ZDF toolkit is a good example, tho. As I said, it's not an originally intended use case of the language and puts the language quite a bit under a stretch.

As for unit delays, originally they were intended to be a fundamental feature of the language, like in SynC Modular. However, playing around with some basic envelope prototypes I ran into some functionality and/or efficiency concerns (don't remember anymore, maybe I even missed smth then). FWIW, this resulted in introduction of lower-level fundamental features (and drifting somewhat more into the imperative language direction), of which unit delays are then built as compound blocks.

PS. Still have to read the Ciaramella paper too

stefano-orastron · Post by **stefano-orastron** » Tue May 31, 2022 6:50 pm

mystran wrote: Tue May 31, 2022 9:57 am
stefano-orastron wrote: Mon May 30, 2022 9:00 am This means that you can express very clearly and concisely things like wave digital filters (even if I do agree with mystral that in practice they're not as good as they look on paper most of the time), or more in general you can abstract things away parts of algorithms as freely as possible.
Going back to the design of a declarative programming language for signal processing, this leads to a dilemma. On one hand, you'd prefer to have a language design that presents well-founded constructs that always lead to working code. On the other hand, you would also prefer to have a language that is complete enough not to unnecessarily restrict what problems can be solved. In fact I feel like this is a general conflict in PL design and I'm not entirely convinced there even exists a global optimum. If we think of general purpose languages, C++, Haskell, Lisp, Lua .. even Python, they all approach different local optimums, but which one is closed to the global optimum usually depends on what you're trying to do exactly.

100% agreed. I couldn't have expressed this better. We know what's our specific use case for such a language: virtual analog stuff and time-domain algorithms. But not WDFs (I've started doing VA using them and even expanded the theory a bit... actually I even suspect I'm to blame for "rebooting" scientific research in the last decade in the audio field... but I've not touched them since 2016 or so). It just happens that WDFs make a useful case for the language. Of course if other uses can be accomodated without bloating the language/compiler and our minds/schedules, why not?

stefano-orastron · Post by **stefano-orastron** » Tue May 31, 2022 6:54 pm

Archit3ch wrote: Tue May 31, 2022 3:03 pm I contacted you about this privately, but I'm reposting here for visibility. Are you aware of Swanky Amp: https://github.com/resonantdsp/swankyamp ? It seems to be using FAUST for its DSP and exporting to C++ for realtime. Does that application hit the limitations described in your blog post?

I agree with the others that community/tooling/documentation matter for programming language adoption.

I'll reply here then. I don't know if FAUST is good enough or not for a specific use. It depends on what the developer is willing to do and how much effort he's willing to spend. What I know is that right now Ciaramella can express certain algorithms in a more natural fashion and has the goal of getting some concrete real-world adoption, but it's new, experimental, etc. and lacks many features right now.

stefano-orastron · Post by **stefano-orastron** » Tue May 31, 2022 7:01 pm

mystran wrote: Tue May 31, 2022 3:39 pm From reading the paper, it appears that the main innovation here is the idea that on high-level one need not worry about evaluation order and implementability as long as there are sufficient delays at some lower level that when the whole thing is inlined and unrolled we can then find an implementable evaluation order?

This. Exactly.

mystran wrote: Tue May 31, 2022 3:39 pm I'm not familiar with FAUST as such, but according to the paper in FAUST you would need to have an implementable evaluation order at the higher-level of blocks, where as here we don't treat the building blocks as atomic units that are evaluated in order, but rather schedule the whole thing globally.

AFAIK, the feedback operator in FAUST implies a unit delay, which means that if you use it at a higher level then it adds up to whatever is found at the lower levels.

mystran wrote: Tue May 31, 2022 3:39 pm Based on some screenshots of ZDF filters in Reactor Core (never actually tried to the thing myself) I thought this was something that RC would do as well? Perhaps Vadim can comment on this?

If it does work like that then I don't quite understand why the manual says that it adds unit delays in feedbacks too. However I also remember hearing the opposite. Oh well.

mystran wrote: Tue May 31, 2022 3:39 pm Anyway I like the idea and I don't think there's an explosion problem with branches in this sort of scheme. The key is to realize that an expression breaks a delay-free loop if it can produce it's output before it consumes it's input and a branch-construct breaks a delay-loop if and only if both of the branches break a delay-free loop. Hence you can recursively color every expression in the graph without having to enumerate all the possibilities.

The problem is not with delay-free loops really but with different instantaneous dependencies based on conditional expressions. We found some cases where you can couple code in such a way that the scheduling must indeed depend on which branches are taken, with all cases being computable.

stefano-orastron · Post by **stefano-orastron** » Tue May 31, 2022 7:16 pm

Z1202 wrote: Tue May 31, 2022 6:08 pm
mystran wrote: Tue May 31, 2022 3:39 pm Based on some screenshots of ZDF filters in Reactor Core (never actually tried to the thing myself) I thought this was something that RC would do as well? Perhaps Vadim can comment on this?
Yes, there is no fixed imperative-like ordering on any level in ReaktorCore (or in its precursor SynC Modular, for that matter ). Unless you explicitly impose the order in some cases. The graph more or less just specifies the values dependencies. The actual evaluation order is defined globally by the compiler down to the lowest fundamental elements (more or less ignoring the block boundaries), and this is a fundamental feature of the language, potentially allowing very high reusability of building blocks in different contexts (of which, at the present time, in ReaktorCore we have only the update conditions, such as various signal rates).

I'm not sure if ZDF toolkit is a good example, tho. As I said, it's not an originally intended use case of the language and puts the language quite a bit under a stretch.

As for unit delays, originally they were intended to be a fundamental feature of the language, like in SynC Modular. However, playing around with some basic envelope prototypes I ran into some functionality and/or efficiency concerns (don't remember anymore, maybe I even missed smth then). FWIW, this resulted in introduction of lower-level fundamental features (and drifting somewhat more into the imperative language direction), of which unit delays are then built as compound blocks.

PS. Still have to read the Ciaramella paper too

It seems like we might have overlooked Reaktor Core indeed - but the manual was a bit misleading. Now I'm seeing all the solid/non-solid stuff for feedbacks. If I understand correctly, if you set the relevant parts as non-solid, then it can actually deal with arbitrary block boundaries. And, yeah, given the complexity I suspect it may have quite some performance issues.

Z1202 · Post by **Z1202** » Tue May 31, 2022 7:17 pm

stefano-orastron wrote: Tue May 31, 2022 7:01 pm If it does work like that then I don't quite understand why the manual says that it adds unit delays in feedbacks too. However I also remember hearing the opposite. Oh well.

It can do both. If a feedback loop doesn't contain an explicit unit delay, an implicit one will be automatically inserted at a kind of random position and the feedback loop is then highlighted in orange as a warning. This auto feature is intended for high-level patching, where the builder doesn't really care about such advanced stuff.

Ciaramella DSP language goes public and open source