How Do Developers Approach Modelling Hardware?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Kraku wrote: Mon Sep 21, 2020 8:53 am How large neural networks are you typically using to model parts of analog synths / signal paths?
They are very small, since the structures are recursive - about 2-4 cells per layer (for now). So the final production code is highly efficient, a bunch of multiply-adds followed by nonlinearities, all of which can be further optimized with SSE or AVX intrinsics.

Richard
Synapse Audio Software - www.synapse-audio.com

Post

Urs wrote: Mon Sep 21, 2020 8:36 am
S950 wrote: Mon Sep 21, 2020 5:25 amWould love to spend a week in the studio just watching you guys work.
To someone listening in it sounds just like tinnitus.
My tinnitus doesn’t sound modeling-worthy to me, tbh... :(

Very interesting thread!

Post

you gotta have a subject, some existing synth/effect
otherwise you'll be chasing that generic mythical "analog(ue)" or "hardware" sound that nobody can scientifically describe.. ever

i was lucky to have picked on one of the (if not THE) "simplest" subtractive synths, lucky because i barely knew anything
so you start naively like anyone else, with "similar" synths and you try to aproximate the subject.. which fails in many places in this case
next, you look into why it fails, you quickly see that this needs a dedicated "model"
you put something together in code or one of those modular environments

it's very important to have good input material, for analysis and for actual comparison
it's very important to know how the subject works
i fully agree that it is very important to decide which properties of the subject are "desired" and which aren't (this is difficult)
when gathering information - watch out, the internet is full of misinformation!

it's then pointless to (circuit-simulation-grade) accurately model a chunk of circuit that happens to act as a voltage regulator or reference that is just.. stable and does nothing

forget the idea that you're gonna take the schematic of the subject and "convert it" into code.. there's going to be a lot of components and sub-circuits doing absolutely boring things
it's great to have a "very accurate" model (perhaps in a circuit simulator) for analysis
you are interested mainly in the output sound signal, and its ingredients, don't go as far as modeling the local power plant, and the radiation from the cosmos, unless you're into that thing

the essential things are tests, comparisons
tests which put your model against the subject (be it recorded audio material or a real unit)
tests which don't avoid exposing the weak spots in your model, but quite the contrairy
this is the difficult part here, seeing your model (your child, in a sense) being "beaten"
it's also going to be more difficult for more complex synths

i've had many long days of working on small things in the middle of the signal path, that's where you have some annoying constant sound, or some weird thumps and pops, and you're staring at the waveforms and trying out different things, over and over, for days
the biggest "progress boost" i had was when i obtained a real hardware unit of the subject (or close enough) .. that became THE subject then and testing and comparisons became much "easier"

tools and setup
if possible, make the "inputs" and conditions to the subject and the model "the same"
if you're playing a melody/performance - turn it into MIDI (or whatever makes sense) and feed that to both
comparisons should be easy, they should show the differences between the model and the subject, not hide them
one of the problems i had is, i wanted to look at certain things on a spectrograph, during the analysis/tests
i had good setup otherwise - the subject (real hardware unit) and the model were synchronized and all that, so i could watch the waveforms of both (panned hard left/right)
then i would record that and run an (offline) spectrograph from the sound editor, which shows each channel as a separate image
this is great but at a certain point, the things may look "very similar" while they aren't, so i'd wanna make a screenshot and overlap the two images in an image editor to see them "one on one" (which can expose some of the differences much better)
this was very slow, and there was no existing realtime spectrograph tool that could show left/right "one on one" .. so i built one
that was priceless - a tool that lets you see certain flaws of your model versus the subject, in realtime, with zero manual effort
find ways to make the process easier, make custom tools if that's gonna help, the "modeling" process is iterative so minimize the boring manual work in it, automate it

watch out when concentrating on certain properties of the model, to not break previously "done" things elsewhere, re-do the old tests

which properties are important? if you ask 10 people you'll probably get 15 answers
that's where it's not a bad idea to include on/off switches (or other controls) to let the user adjust the amount of realism
i personally don't like noisy potentiometers, this is a horror, or long-term detuning of VCOs
IMO you should base your model so that it captures the subject in a normal or typical working condition, by default
noise and hum can be okay or not, so i'd put an on/off swtich or a slider

at the end, this comes down to artistic decisions

and ask questions, i wouldn't have gone anywhere without the huge help from a pile of very clever people from kvr, musicdsp, freenode, etc..
It doesn't matter how it sounds..
..as long as it has BASS and it's LOUD!

irc.libera.chat >>> #kvr

Post

Richard_Synapse wrote: Sat Sep 26, 2020 1:17 pm
Kraku wrote: Mon Sep 21, 2020 8:53 am How large neural networks are you typically using to model parts of analog synths / signal paths?
They are very small, since the structures are recursive - about 2-4 cells per layer (for now). So the final production code is highly efficient, a bunch of multiply-adds followed by nonlinearities, all of which can be further optimized with SSE or AVX intrinsics.

Richard
How many layers do you usually have?

Post

Shouldn't the better question be:

Why should people model hardware, rather than create novel models?

That is, if software allows "much of any kind of function", then why would one want "only hardware-implementable functions"?

Post

Maybe start a topic about the "better" question then :idea:

Post

Kraku wrote: Mon Sep 28, 2020 10:39 am How many layers do you usually have?
Depends on what is modeled. If the circuit to model has two filtering blocks, for instance, then two hidden layers is the most natural choice.

In general though it is easiest to simply start with one layer and keep adding as many as needed.

Richard
Synapse Audio Software - www.synapse-audio.com

Post

I assume you first build a slow to process accurate model of the circuit and then train the network using that circuit simulation?

Post

Richard_Synapse wrote: Thu Oct 01, 2020 8:35 am
Kraku wrote: Mon Sep 28, 2020 10:39 am How many layers do you usually have?
Depends on what is modeled. If the circuit to model has two filtering blocks, for instance, then two hidden layers is the most natural choice.

In general though it is easiest to simply start with one layer and keep adding as many as needed.
In theory, you don't necessarily ever need more than one non-linear layer (+ a linear output layer).

For example, if we take a state-space architecture such as the DAFX19 paper from NI then as long as the state-vector is wide enough (which typically means one state per capacitor/inductor in a circuit) the problem reduces to finding a memory-less non-linear function from inputs to outputs. You could take other architures, but this one is particularly easy to reason about.

As per universal approximation theorem, if we take an affine map A from the inputs to n dimensions (ie. the input weights), any component-wise continuous non-polynomial "activation" function s(.) and another affine map B from n dimensions to the output (ie. the output weights of a linear output layer), then for any continuous F(x) and for any e > 0 there exists some n such that |F(x)-B(s(A(x)))|<e for some choice of A and B, with x in some compact subset of the input space.

What this means is that strictly speaking you never need more than one non-linear layer and a more interesting question is whether you can reduce the total amount of computation by cascading multiple approximations with lower internal dimensions. In other words, the question is whether the function to be approximated can be decomposed into a set of functions such that approximating each one of them separately is easier than approximating the composite.

The dual of the "arbitrary width" case gives some lower bounds on the internal dimensions if we want to approximate to arbitrary precision using "arbitrary depth" instead, but given the lower bounds on how efficient a single layer can be (and still support universal approximation), we probably don't want to go there either. Rather this suggests that we can expect to reduce the necessary width up to a point by increasing the number of layers, which might or might not lead to some computational savings for any given target function.

Going back to the single-layer architecture though, there's some interesting thoughts to be had in terms of the activation functions. If we take ReLU as the activation, then it is easy to see that this allows us to construct precisely the set of piece-wise linear approximations and the above theorem then basically states that we can approximate any continuous function to an arbitrary precision as a piece-wise linear function, which shouldn't be too surprising to anyone. If we take something like tanh() as the activation function, then we can construct approximations as a sum of scaled/shifted tanh() shapes (ie. "smoothed steps") which also should make some intuitive sense.

This in turn leads to another interesting theoretical question: since we can theoretically pick any arbitrary continuous non-polynomial function, what should we pick in order to get the fastest convergence in terms of the network width? While existing ANN research suggests some sort of smooth ReLU tends to perform well, much of it is concerned with classification rather than function approximation, much of it tends to focus on how fast you can train huge networks (which is probably mostly irrelevant for what we're discussing here) and much of it is arguably of very poor scientific quality (eg. I particularly love the practice of terminating training precisely where the data happens to support whatever conclusions are desired, even when the nearby datapoints make it completely obvious that the "conclusions" are really just statistically insignificant noise).

Post

Hmmm, I have no knowledge about neuronal networks, but for me it sounds like this approach takes the fun out of it. My curiosity of how things work is a driving factor in the development process, and a black box can not satisfy that. My naive intuition furthermore tells me, that a sufficiently accurate model based on a neuronal network must require code to "run the neurons", hence inherently draws more CPU than a hand coded model based on "just maths". Bust as I said, I know nothing about this, I could be wrong.

Post

Actually a model generated with neural networks can be more CPU efficient than the "just math" approach, I've been told.

Post

EvilDragon wrote: Thu Oct 01, 2020 12:42 pm Actually a model generated with neural networks can be more CPU efficient than the "just math" approach, I've been told.
I'm not quite sure if we're talking about a model represented by a trained network, which runs a network in realtime, or a piece of maths that was tuned by a neuronal network, which does not require to run the network in realtime.

I guess I'm too old a fart to accept the idea that my brain fun could be outperformed by a data munching machine.

Post

Hey, brain fun is still very much important! So don't be discouraged. :)

Post

Urs wrote: Thu Oct 01, 2020 12:38 pm Hmmm, I have no knowledge about neuronal networks, but for me it sounds like this approach takes the fun out of it. My curiosity of how things work is a driving factor in the development process, and a black box can not satisfy that. My naive intuition furthermore tells me, that a sufficiently accurate model based on a neuronal network must require code to "run the neurons", hence inherently draws more CPU than a hand coded model based on "just maths". Bust as I said, I know nothing about this, I could be wrong.
Well, I'm sort of the same in terms of not really being a fan of blackbox models, but in some sense this isn't too different from something like the design of min-max filters: you search for an approximation that gives you the lowest error, rather than trying to construct such an approximation from first principles. For a frozen model (ie. no more training), the code for each layer is essentially a matrix multiplication, followed by component-wise application of some chosen non-linearity. Depending on the size and number of layers required, this is not necessarily more expensive than a white-box model of the same thing.

Think about it this way: if you have a linear "ZDF" system Ax=b where x is the new states, then we know we can causally update the state-space by inverting A (or if dim(b)>dim(x) we could do a least-squares fit using pseudo-inverses, etc). Let x=Bb be a single-layer linear ANN from inputs b to outputs x. Then "training" is a matter of searching for B such that it well approximates the inverse of A.

For a linear system this is pointless (since we already know an efficient way to compute the answer), but for a non-linear system you would typically need to perform some sort of iteration to find an accurate solution, because we often don't really know what the inverse system looks like. However, what we can do is perform a search (ie. train a network) to try and find an approximation of the inverse that is hopefully more efficient than solving the system directly using iteration (or whatever other costly methods might apply).

So the way I see it, it's not necessarily as clear-cut as white vs. black box modelling. You could do a white box model, then train a network to approximate the solution more efficiently than an actual solver. You could still use such a solver to produce the training data (which is great, because now you have unlimited supply), but if you can find a sufficiently accurate approximation that is faster, then you are no longer constrained on the real-time performance of the solver itself. It is certainly possible that no such approximation exists in some particular case or that we cannot feasibly find it, but it is also possible that in some cases such an approximation does exists and we can find it through optimization (ie. training), even though we might not readily see in advance what it actually looks like.

So the way I see it, this stuff isn't fundamentally that different from using a min-max polynomial or rational functions to approximate transcendentals. We could use analytic series for these, but they converge poorly, so we run an optimization procedure instead and obtain something that's faster to compute, but still good enough in terms of accuracy.

Post

Urs wrote: Thu Oct 01, 2020 12:38 pm Hmmm, I have no knowledge about neuronal networks, but for me it sounds like this approach takes the fun out of it. My curiosity of how things work is a driving factor in the development process, and a black box can not satisfy that.
I like to know that too. At the same time, I like magic, and a working network gives you just that :D

Also those who take a more generic approach to circuit modeling and "just" build a netlist and feed that into a solver are essentially doing something similar anyway, letting the computer figure it all out.

Richard
Synapse Audio Software - www.synapse-audio.com

Post Reply

Return to “DSP and Plugin Development”