Machine learning for creating complex presets

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

I'm not a programmer, I'm just musing for the sake of my own curiosity.

So, say you have an incredibly complex synthesis engine, like a 64 op FM synth or all the fundamental building blocks for physical modelling. The programming is done indirectly, by feeding a machine learning algorithm audio samples of an instrument, which it then tries to emulate using synthesis. Same in principle as training it to recognise a human face.

The goal is to achieve highly expressive MPE presets where the user might also have access to some parameters for tweaking after the fact. Personally I'm not interested in copying acoustic instruments, but rather finding new sounds that have the qualities and expressiveness of acoustic and electro acousic instruments.

Is this viable in the (hopefully not too distant) future?

Post

There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...

Post

It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.

Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.

Richard
Synapse Audio Software - www.synapse-audio.com

Post

Tj Shredder wrote: Wed Jul 10, 2019 4:18 pm There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...
I'm thinking about synths with hundreds (if not thousands) of parameters, where you would not be able to program it manually in any reasonable amount of time.

Post

Richard_Synapse wrote: Wed Jul 10, 2019 4:44 pm It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.

Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.

Richard
I am simply drawing paralells to the dream images we have all seen where an algorithm is trained to reconize a dog, and then is fed back into itself to create all those dream like images. These were created from noise.

Granted, the leap is quite a big one and I agree that it may not be trivial, but you could feed the output of the synthesis engine back into an algorithm trained to recognise a trumpet sound in a similar way... perhaps? It would try to recreate the same sound by trial and error. It might end up in a completely different place but the results could still be interesting.

I realise how this sounds - like another non-programmer with a bright idea, but I thought I'd just air it at least. As I said, I'm just curious.

Post

Muied Lumens wrote: Wed Jul 10, 2019 7:06 pm
Tj Shredder wrote: Wed Jul 10, 2019 4:18 pm There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...
I'm thinking about synths with hundreds (if not thousands) of parameters, where you would not be able to program it manually in any reasonable amount of time.
That is called additive synthesis...;-)
The problem is not the number of parameters, but how to expressively control them with a handful of macros. There you need a boiled down description. Possible for known acoustic instruments.
Human creative decisions are always needed and wanted for those new instruments. A.I. won't invent them for you, and inventing is the fun I want to have, I would never give it away for convenience...

Post

Someone posted this recently in another thread.

You don't need to look for ''incredibly complex'' solutions.

http://www.youtube.com/watch?v=2hw4yc4xUAs
Anyone who can make you believe absurdities can make you commit atrocities.

Post

Yes, this is possible.

1. Create an effect plugin, which can process incoming audio and send midi notes and cc data.

2. The effect can load a sample, which is then converted into 2 high resolution spectral images through the fourier transform (frequency is y, time is x). One image for phase, one for amplitude.

3. The effect will send one midi note, and record incoming audio. This audio is also converted into phase and amplitude spectral images via fourier transform.

4. The effect will send many midi cc parameters outwardly before the midi note is sent, along with pitch bend, to adjust all available parameters of connected synths and effects.

5. Use a standard nested hill climbing search algorithm, or genetic algorithm, to send one note at a time along with cc midi values, and record the result.

6. During the search, after each note is sent along with cc values, and the new audio is converted into an image... Directly compare the loaded reference sample spectral images, against the new audio spectral images, by subtracting each 'pixel'/bin-value, take the absolute value, and add it all up. (Adding up the difference.) The closer the result is to zero, the closer they match. Therefore, zero, is a perfect score, indicating no difference between current input and the reference sample.

7. To help the graduality of the search and guide it, apply a gaussian blur to all sprectral images before comparison. The blur radius decreases as the score improves. This will speed up the search dramatically.

8. Repeatedly search and find the score, and keep a top 10 list both by top score, and diversity (distance of top score results from each other, parametrically).

Also, have an internal delay parameter to be implemented to adjust when incoming audio is processed to include lag time (earliness/lateness) as a parameter.

Record the same length of time as the reference sample so the comparision images have the same dimensions.

This can also be used to make new synths, but by searching code variations instead of cc parameter variations. You would need to provide many samples though, achievable by regression similar to Gepsoft GeneXproTools.
SLH - Yes, I am a woman, deal with it.

Post

Vertion wrote: Wed Jul 10, 2019 9:50 pm Yes, this is possible.
Interesting.

Thank you for breaking it down for me.

Post

That's more or less the idea of the Hartmann Neuron

https://www.youtube.com/watch?v=8-GrZ6NSM0E

Post

MadBrain wrote: Thu Jul 11, 2019 8:09 pm That's more or less the idea of the Hartmann Neuron

https://www.youtube.com/watch?v=8-GrZ6NSM0E
Right! I had forgotten about that one. Thanks!

Post

Richard_Synapse wrote: Wed Jul 10, 2019 4:44 pm It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.

Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.

Richard
Well, it isn’t really all that difficult to decide whether a given sound sounds like a trumpet using existing Generative Adversarial Network methods. I doubt it would come up with meaningful parameters to tweak the resulting sounds, however. We’re mostly talking convolution kernels here. Anyway, I think additive resynthesis is the way to go here.
Incomplete list of my gear: 1/8" audio input jack.

Post

Interesting question and something I have been thinking about too occasionally. Not with the goal of emulating an existing sound, but more as way of simplifying the interface of a complex synthesizer into a smaller number of more meaningful and expressive parameters.

Vertions suggestion is a good start. Though I have a feeling that just looking at the fourier transform (spectrum and phase) is not going to be the best way of classifying audio. Audio can have quite different spectral images and still sound similar and vice versa. My guess is that you would need to do the classification with features that more closely match how humans hear and identify sounds.

Also, if you take, say 5 second audio clips in every iteration, the static Fourier transform of the clip will not tell you anything about how the timbre evolves during those 5 seconds (filter sweeps and harmonically rich attacks for instance). So you need to use features that take that into account as well.

Post

An advanced synth with simple controls sounds great to me, although I can understand those who want full control over their programming too. For those who don't mind, a rating system combined with an evolution/randomiser might yield interesting results. So, you choose which sounds you like and it creates iterations of that sound, which you then choose between again, and so on, until you are satisfied. An easy way to create presets which are unique to you.

Obviously the problem with "expressiveness" remains, but you could have the same process with the performance aspect as you do with the timbre. You could even split the two, so you have performance macros made for string like instruments, blown, bowed, etc. It wouldn't take long before you have an MPE instrument which is personalised to your unique playing style. At least in principle.

Post

machine learning works by data training. definately available patches for some vst number in sufficient quantities.

one of the interesting things is that the solution becomes parametric revealing patterning that may not be obviated otherwise.

not my thing tho, sorting thru shit wears me down personally. i prefer proceduralism as an end rather than a means, it can reveal so much more about existence than human expression.
you come and go, you come and go. amitabha neither a follower nor a leader be tagore "where roads are made i lose my way" where there is certainty, consideration is absent.

Post Reply

Return to “DSP and Plugin Development”