KVR Audio

Z1202 · Post by **Z1202** » Tue May 31, 2022 7:22 pm

stefano-orastron wrote: Tue May 31, 2022 7:16 pm It seems like we might have overlooked Reaktor Core indeed - but the manual was a bit misleading. Now I'm seeing all the solid/non-solid stuff for feedbacks. If I understand correctly, if you set the relevant parts as non-solid, then it can actually deal with arbitrary block boundaries. And, yeah, given the complexity I suspect it may have quite some performance issues.

Solid is not really 100% solid, but just in some aspects, including the feedback resolution. A solid block is however still not compiled as a closed entity and all its internal parts participate on their own in the global scheduling by the compiler. This was a learning from SynC Modular, where automatic feedback resolution could have been inserted at a random position, sometimes deeply inside the structure and one doesn't see it. In the first version of ReaktorCore we didn't highlight the entire loop either (just the point of the implicit unit delay), and I guess that was the primary motivation for the solid feature. However it also proved useful in other aspects. In the end, most of the blocks can't be used for feedback resolution, as there are dependencies from inputs to outputs. Marking such blocks as solid makes this explicit and makes the builder workflow more predictable. Only a few modules (like unit delays and such) actually benefit from feedback resolution, and respectively need to be marked as non-solid.

Edit: you could say that from the functional perspective solid behaves like 100% solid, so it's simpler to handle for the user. However under the hood it's still not really solid, to maintain the optimization possibilities.

Edit2: actually Edit1 is probably wrong, the solidity is not really 100%. Just in some aspects, as I originally wrote.

As for the performance issues, I challenge you to find some. Not that there are none, but the compiler is very advanced in "tearing the blocks apart" and recombining them back to the most performant linear code (or so is the goal at least

). Most of the performance tradeoffs come from hand-written register optimization which obviously cannot compete with optimizers written by dedicated compiler teams (like C/C++) over large numbers of years. IIRC in my benchmarks it was in typically between 1-2 times slower that a genuine C++ compiler, probably averaging somewhere around 1.5

Z1202 · Post by **Z1202** » Tue May 31, 2022 7:32 pm

stefano-orastron wrote: Tue May 31, 2022 7:01 pm The problem is not with delay-free loops really but with different instantaneous dependencies based on conditional expressions. We found some cases where you can couple code in such a way that the scheduling must indeed depend on which branches are taken, with all cases being computable.

Oh, so you're doing dynamic scheduling based on which branches are taken? Hmmm, sounds like a can of worms for the compiler and a can of unpredictability for the language user. Please tell me I'm wrong and you solved it?

Edit: I briefly checked the paper. So your primary use case and motivation is WDFs? Difficult for me to judge, since I never had enough motivation to understand the WDF idea. I wonder if WDFs would be doable in ReaktorCore as well, similarly to the ZDF toolkit (where the latter has quite a few restrictions, since the language doesn't fully fit for that, but maybe those wouldn't apply to WDFs), but then again I'd have no idea

It's nice that your compiler is only 1000 lines, IIUC. Sounds not bad for what there is already.

mystran · Post by **mystran** » Tue May 31, 2022 9:13 pm

stefano-orastron wrote: Tue May 31, 2022 7:01 pm The problem is not with delay-free loops really but with different instantaneous dependencies based on conditional expressions. We found some cases where you can couple code in such a way that the scheduling must indeed depend on which branches are taken, with all cases being computable.

You could also spec the language such that it is a programming error for different branches to require different scheduling. In a sense, I'd treat it as part of the "type system" so to speak . This should then allows you to preserve branches as local branches in the generated code.

Z1202 · Post by **Z1202** » Tue May 31, 2022 9:23 pm

mystran wrote: Tue May 31, 2022 9:13 pm You could also spec the language such that it is a programming error for different branches to require different scheduling. In a sense, I'd treat it as part of the "type system" so to speak . This should then allows you to preserve branches as local branches in the generated code.

While this eliminates the problem, the question is whether some important use cases can be lost by that

No idea, as, as mentioned, I don't have a good picture of the use case spectrum (WDFs

)

mystran · Post by **mystran** » Tue May 31, 2022 9:41 pm

Z1202 wrote: Tue May 31, 2022 7:22 pmMost of the performance tradeoffs come from hand-written register optimization which obviously cannot compete with optimizers written by dedicated compiler teams (like C/C++) over large numbers of years.

By "hand-written register optimization" do you mean some hand-written register schedule or just that you wrote your own register allocator? I don't think one necessarily needs a super-fancy register allocator to get most of the value out of the optimization, even LLVM used a relatively simple linear scan allocator until somewhat recently (though the new one is clever; anyone into that kind of stuff I highly suggest you check out the presentations).

Z1202 · Post by **Z1202** » Wed Jun 01, 2022 6:12 am

mystran wrote: Tue May 31, 2022 9:41 pm By "hand-written register optimization" do you mean some hand-written register schedule or just that you wrote your own register allocator? I don't think one necessarily needs a super-fancy register allocator to get most of the value out of the optimization, even LLVM used a relatively simple linear scan allocator until somewhat recently (though the new one is clever; anyone into that kind of stuff I highly suggest you check out the presentations).

I mean register allocator (not sure what is "register schedule"). Not sure what exactly LLVM does or did, but even that might be a little bit more advanced, dunno. Actually depending on the structure there might be another source of performance drop (not too large, but sometimes noticeable), I think that one could be addressed by a special extra compiler pass. It'd be nice to be really on par with C/C++ compilers, but my point was, ReaktorCore is more or less in the same ballpark already.

mystran · Post by **mystran** » Wed Jun 01, 2022 7:43 am

Z1202 wrote: Wed Jun 01, 2022 6:12 am
mystran wrote: Tue May 31, 2022 9:41 pm By "hand-written register optimization" do you mean some hand-written register schedule or just that you wrote your own register allocator? I don't think one necessarily needs a super-fancy register allocator to get most of the value out of the optimization, even LLVM used a relatively simple linear scan allocator until somewhat recently (though the new one is clever; anyone into that kind of stuff I highly suggest you check out the presentations).
I mean register allocator (not sure what is "register schedule").

Register schedule (which register holds which value at which point in "time") is what the register allocator outputs, no?

Z1202 · Post by **Z1202** » Wed Jun 01, 2022 7:53 am

mystran wrote: Wed Jun 01, 2022 7:43 am
Z1202 wrote: Wed Jun 01, 2022 6:12 am
mystran wrote: Tue May 31, 2022 9:41 pm By "hand-written register optimization" do you mean some hand-written register schedule or just that you wrote your own register allocator? I don't think one necessarily needs a super-fancy register allocator to get most of the value out of the optimization, even LLVM used a relatively simple linear scan allocator until somewhat recently (though the new one is clever; anyone into that kind of stuff I highly suggest you check out the presentations).
I mean register allocator (not sure what is "register schedule").
Register schedule (which register holds which value at which point in "time") is what the register allocator outputs, no?

If so, I don't understand your original question. Hand-written allocator would result in a hand-written schedule, right? Or you mean explicitly scheduling registers in some fixed way? But the latter is kinda happening if one wants to stick to calling conventions already, so I'm again not exactly sure where the boundary for a schedule considered as fixed lies and what the question is about.

Z1202 · Post by **Z1202** » Wed Jun 01, 2022 8:02 am

Actually I feel we have effectively hijacked Stefano's thread for the discussion of ReaktorCore. Was not my intention, so Stefano, please accept my apologies. @mystran, I guess maybe we could continue the discussion in PM, or if others are interested in following, start another thread?

ghettosynth · Post by **ghettosynth** » Wed Jun 01, 2022 9:53 am

stefano-orastron · Post by **stefano-orastron** » Wed Jun 01, 2022 4:31 pm

Z1202 wrote: Tue May 31, 2022 7:22 pm As for the performance issues, I challenge you to find some. Not that there are none, but the compiler is very advanced in "tearing the blocks apart" and recombining them back to the most performant linear code (or so is the goal at least ). Most of the performance tradeoffs come from hand-written register optimization which obviously cannot compete with optimizers written by dedicated compiler teams (like C/C++) over large numbers of years. IIRC in my benchmarks it was in typically between 1-2 times slower that a genuine C++ compiler, probably averaging somewhere around 1.5

1.5 is excellent.

stefano-orastron · Post by **stefano-orastron** » Wed Jun 01, 2022 4:36 pm

Z1202 wrote: Tue May 31, 2022 7:32 pm
stefano-orastron wrote: Tue May 31, 2022 7:01 pm The problem is not with delay-free loops really but with different instantaneous dependencies based on conditional expressions. We found some cases where you can couple code in such a way that the scheduling must indeed depend on which branches are taken, with all cases being computable.
Oh, so you're doing dynamic scheduling based on which branches are taken? Hmmm, sounds like a can of worms for the compiler and a can of unpredictability for the language user. Please tell me I'm wrong and you solved it?

Dynamic in the sense that branches get actually translated to "if"s in C. As said, I think we solved it by doing some graph analysis, so that we consider branches together only when they are inter-dependent, in which case you need to consider all potential code flow paths, vs when they're independent, in which case they the forks are "local".

Z1202 wrote: Tue May 31, 2022 7:32 pm Edit: I briefly checked the paper. So your primary use case and motivation is WDFs? Difficult for me to judge, since I never had enough motivation to understand the WDF idea. I wonder if WDFs would be doable in ReaktorCore as well, similarly to the ZDF toolkit (where the latter has quite a few restrictions, since the language doesn't fully fit for that, but maybe those wouldn't apply to WDFs), but then again I'd have no idea

No, but they make a compelling case for the declarative approach nevertheless. The motivation for that is keeping things as modular as possible.

Z1202 wrote: Tue May 31, 2022 7:32 pm It's nice that your compiler is only 1000 lines, IIUC. Sounds not bad for what there is already.

And imagine that Paolo (the main developer) is more verbose than me with coding.

stefano-orastron · Post by **stefano-orastron** » Wed Jun 01, 2022 4:37 pm

Z1202 wrote: Wed Jun 01, 2022 8:02 am Actually I feel we have effectively hijacked Stefano's thread for the discussion of ReaktorCore. Was not my intention, so Stefano, please accept my apologies. @mystran, I guess maybe we could continue the discussion in PM, or if others are interested in following, start another thread?

No need to apologize, I'm learning new stuff.

Ciaramella DSP language goes public and open source