KVR Audio

Muied Lumens · Post by **Muied Lumens** » Wed Jul 10, 2019 12:08 pm

I'm not a programmer, I'm just musing for the sake of my own curiosity.

So, say you have an incredibly complex synthesis engine, like a 64 op FM synth or all the fundamental building blocks for physical modelling. The programming is done indirectly, by feeding a machine learning algorithm audio samples of an instrument, which it then tries to emulate using synthesis. Same in principle as training it to recognise a human face.

The goal is to achieve highly expressive MPE presets where the user might also have access to some parameters for tweaking after the fact. Personally I'm not interested in copying acoustic instruments, but rather finding new sounds that have the qualities and expressiveness of acoustic and electro acousic instruments.

Is this viable in the (hopefully not too distant) future?

Tj Shredder · Post by **Tj Shredder** » Wed Jul 10, 2019 4:18 pm

There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...

Richard_Synapse · Post by **Richard_Synapse** » Wed Jul 10, 2019 4:44 pm

It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.

Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.

Richard

Muied Lumens · Post by **Muied Lumens** » Wed Jul 10, 2019 7:06 pm

Tj Shredder wrote: ↑Wed Jul 10, 2019 4:18 pm There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...

I'm thinking about synths with hundreds (if not thousands) of parameters, where you would not be able to program it manually in any reasonable amount of time.

Muied Lumens · Post by **Muied Lumens** » Wed Jul 10, 2019 7:17 pm

Richard_Synapse wrote: ↑Wed Jul 10, 2019 4:44 pm It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.

Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.

Richard

I am simply drawing paralells to the dream images we have all seen where an algorithm is trained to reconize a dog, and then is fed back into itself to create all those dream like images. These were created from noise.

Granted, the leap is quite a big one and I agree that it may not be trivial, but you could feed the output of the synthesis engine back into an algorithm trained to recognise a trumpet sound in a similar way... perhaps? It would try to recreate the same sound by trial and error. It might end up in a completely different place but the results could still be interesting.

I realise how this sounds - like another non-programmer with a bright idea, but I thought I'd just air it at least. As I said, I'm just curious.

Tj Shredder · Post by **Tj Shredder** » Wed Jul 10, 2019 9:38 pm

Muied Lumens wrote: ↑Wed Jul 10, 2019 7:06 pm
Tj Shredder wrote: ↑Wed Jul 10, 2019 4:18 pm There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...
I'm thinking about synths with hundreds (if not thousands) of parameters, where you would not be able to program it manually in any reasonable amount of time.

That is called additive synthesis...

The problem is not the number of parameters, but how to expressively control them with a handful of macros. There you need a boiled down description. Possible for known acoustic instruments.
Human creative decisions are always needed and wanted for those new instruments. A.I. won't invent them for you, and inventing is the fun I want to have, I would never give it away for convenience...

Aloysius · Post by **Aloysius** » Wed Jul 10, 2019 9:44 pm

Someone posted this recently in another thread.

You don't need to look for ''incredibly complex'' solutions.

http://www.youtube.com/watch?v=2hw4yc4xUAs

Vertion · Post by **Vertion** » Wed Jul 10, 2019 9:50 pm

Yes, this is possible.

1. Create an effect plugin, which can process incoming audio and send midi notes and cc data.

2. The effect can load a sample, which is then converted into 2 high resolution spectral images through the fourier transform (frequency is y, time is x). One image for phase, one for amplitude.

3. The effect will send one midi note, and record incoming audio. This audio is also converted into phase and amplitude spectral images via fourier transform.

4. The effect will send many midi cc parameters outwardly before the midi note is sent, along with pitch bend, to adjust all available parameters of connected synths and effects.

5. Use a standard nested hill climbing search algorithm, or genetic algorithm, to send one note at a time along with cc midi values, and record the result.

6. During the search, after each note is sent along with cc values, and the new audio is converted into an image... Directly compare the loaded reference sample spectral images, against the new audio spectral images, by subtracting each 'pixel'/bin-value, take the absolute value, and add it all up. (Adding up the difference.) The closer the result is to zero, the closer they match. Therefore, zero, is a perfect score, indicating no difference between current input and the reference sample.

7. To help the graduality of the search and guide it, apply a gaussian blur to all sprectral images before comparison. The blur radius decreases as the score improves. This will speed up the search dramatically.

8. Repeatedly search and find the score, and keep a top 10 list both by top score, and diversity (distance of top score results from each other, parametrically).

Also, have an internal delay parameter to be implemented to adjust when incoming audio is processed to include lag time (earliness/lateness) as a parameter.

Record the same length of time as the reference sample so the comparision images have the same dimensions.

This can also be used to make new synths, but by searching code variations instead of cc parameter variations. You would need to provide many samples though, achievable by regression similar to Gepsoft GeneXproTools.

Muied Lumens · Post by **Muied Lumens** » Wed Jul 10, 2019 10:15 pm

Vertion wrote: ↑Wed Jul 10, 2019 9:50 pm Yes, this is possible.

Interesting.

Thank you for breaking it down for me.

MadBrain · Post by **MadBrain** » Thu Jul 11, 2019 8:09 pm

That's more or less the idea of the Hartmann Neuron

https://www.youtube.com/watch?v=8-GrZ6NSM0E

Muied Lumens · Post by **Muied Lumens** » Fri Jul 12, 2019 12:36 am

MadBrain wrote: ↑Thu Jul 11, 2019 8:09 pm That's more or less the idea of the Hartmann Neuron

https://www.youtube.com/watch?v=8-GrZ6NSM0E

Right! I had forgotten about that one. Thanks!

deastman · Post by **deastman** » Fri Jul 12, 2019 1:11 am

Richard_Synapse wrote: ↑Wed Jul 10, 2019 4:44 pm It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.

Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.

Richard

Well, it isn’t really all that difficult to decide whether a given sound sounds like a trumpet using existing Generative Adversarial Network methods. I doubt it would come up with meaningful parameters to tweak the resulting sounds, however. We’re mostly talking convolution kernels here. Anyway, I think additive resynthesis is the way to go here.

noizebox · Post by **noizebox** » Fri Jul 12, 2019 8:09 am

Interesting question and something I have been thinking about too occasionally. Not with the goal of emulating an existing sound, but more as way of simplifying the interface of a complex synthesizer into a smaller number of more meaningful and expressive parameters.

Vertions suggestion is a good start. Though I have a feeling that just looking at the fourier transform (spectrum and phase) is not going to be the best way of classifying audio. Audio can have quite different spectral images and still sound similar and vice versa. My guess is that you would need to do the classification with features that more closely match how humans hear and identify sounds.

Also, if you take, say 5 second audio clips in every iteration, the static Fourier transform of the clip will not tell you anything about how the timbre evolves during those 5 seconds (filter sweeps and harmonically rich attacks for instance). So you need to use features that take that into account as well.

Muied Lumens · Post by **Muied Lumens** » Fri Jul 12, 2019 12:01 pm

An advanced synth with simple controls sounds great to me, although I can understand those who want full control over their programming too. For those who don't mind, a rating system combined with an evolution/randomiser might yield interesting results. So, you choose which sounds you like and it creates iterations of that sound, which you then choose between again, and so on, until you are satisfied. An easy way to create presets which are unique to you.

Obviously the problem with "expressiveness" remains, but you could have the same process with the performance aspect as you do with the timbre. You could even split the two, so you have performance macros made for string like instruments, blown, bowed, etc. It wouldn't take long before you have an MPE instrument which is personalised to your unique playing style. At least in principle.

xoxos · Post by **xoxos** » Sat Jul 13, 2019 5:50 pm

machine learning works by data training. definately available patches for some vst number in sufficient quantities.

one of the interesting things is that the solution becomes parametric revealing patterning that may not be obviated otherwise.

not my thing tho, sorting thru shit wears me down personally. i prefer proceduralism as an end rather than a means, it can reveal so much more about existence than human expression.

Machine learning for creating complex presets