Machine learning for creating complex presets
- KVRist
- Topic Starter
- 66 posts since 18 Jan, 2017
I'm not a programmer, I'm just musing for the sake of my own curiosity.
So, say you have an incredibly complex synthesis engine, like a 64 op FM synth or all the fundamental building blocks for physical modelling. The programming is done indirectly, by feeding a machine learning algorithm audio samples of an instrument, which it then tries to emulate using synthesis. Same in principle as training it to recognise a human face.
The goal is to achieve highly expressive MPE presets where the user might also have access to some parameters for tweaking after the fact. Personally I'm not interested in copying acoustic instruments, but rather finding new sounds that have the qualities and expressiveness of acoustic and electro acousic instruments.
Is this viable in the (hopefully not too distant) future?
So, say you have an incredibly complex synthesis engine, like a 64 op FM synth or all the fundamental building blocks for physical modelling. The programming is done indirectly, by feeding a machine learning algorithm audio samples of an instrument, which it then tries to emulate using synthesis. Same in principle as training it to recognise a human face.
The goal is to achieve highly expressive MPE presets where the user might also have access to some parameters for tweaking after the fact. Personally I'm not interested in copying acoustic instruments, but rather finding new sounds that have the qualities and expressiveness of acoustic and electro acousic instruments.
Is this viable in the (hopefully not too distant) future?
- KVRAF
- 8826 posts since 6 Jan, 2017 from Outer Space
There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...
-
Richard_Synapse Richard_Synapse https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=245936
- KVRian
- 1136 posts since 20 Dec, 2010
It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.
Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.
Richard
Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.
Richard
Synapse Audio Software - www.synapse-audio.com
- KVRist
- Topic Starter
- 66 posts since 18 Jan, 2017
I'm thinking about synths with hundreds (if not thousands) of parameters, where you would not be able to program it manually in any reasonable amount of time.Tj Shredder wrote: ↑Wed Jul 10, 2019 4:18 pm There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...
- KVRist
- Topic Starter
- 66 posts since 18 Jan, 2017
I am simply drawing paralells to the dream images we have all seen where an algorithm is trained to reconize a dog, and then is fed back into itself to create all those dream like images. These were created from noise.Richard_Synapse wrote: ↑Wed Jul 10, 2019 4:44 pm It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.
Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.
Richard
Granted, the leap is quite a big one and I agree that it may not be trivial, but you could feed the output of the synthesis engine back into an algorithm trained to recognise a trumpet sound in a similar way... perhaps? It would try to recreate the same sound by trial and error. It might end up in a completely different place but the results could still be interesting.
I realise how this sounds - like another non-programmer with a bright idea, but I thought I'd just air it at least. As I said, I'm just curious.
- KVRAF
- 8826 posts since 6 Jan, 2017 from Outer Space
That is called additive synthesis...Muied Lumens wrote: ↑Wed Jul 10, 2019 7:06 pmI'm thinking about synths with hundreds (if not thousands) of parameters, where you would not be able to program it manually in any reasonable amount of time.Tj Shredder wrote: ↑Wed Jul 10, 2019 4:18 pm There is a contradiction in your vision. If the machine learning listens to existing sounds, it would either copy acoustic sounds, or is not needed if the sound is already electronic and thus you know already how its made...
The problem is not the number of parameters, but how to expressively control them with a handful of macros. There you need a boiled down description. Possible for known acoustic instruments.
Human creative decisions are always needed and wanted for those new instruments. A.I. won't invent them for you, and inventing is the fun I want to have, I would never give it away for convenience...
- KVRAF
- 40228 posts since 11 Aug, 2008 from clown world
Someone posted this recently in another thread.
You don't need to look for ''incredibly complex'' solutions.
http://www.youtube.com/watch?v=2hw4yc4xUAs
You don't need to look for ''incredibly complex'' solutions.
http://www.youtube.com/watch?v=2hw4yc4xUAs
Anyone who can make you believe absurdities can make you commit atrocities.
- Banned
- 697 posts since 29 Oct, 2016
Yes, this is possible.
1. Create an effect plugin, which can process incoming audio and send midi notes and cc data.
2. The effect can load a sample, which is then converted into 2 high resolution spectral images through the fourier transform (frequency is y, time is x). One image for phase, one for amplitude.
3. The effect will send one midi note, and record incoming audio. This audio is also converted into phase and amplitude spectral images via fourier transform.
4. The effect will send many midi cc parameters outwardly before the midi note is sent, along with pitch bend, to adjust all available parameters of connected synths and effects.
5. Use a standard nested hill climbing search algorithm, or genetic algorithm, to send one note at a time along with cc midi values, and record the result.
6. During the search, after each note is sent along with cc values, and the new audio is converted into an image... Directly compare the loaded reference sample spectral images, against the new audio spectral images, by subtracting each 'pixel'/bin-value, take the absolute value, and add it all up. (Adding up the difference.) The closer the result is to zero, the closer they match. Therefore, zero, is a perfect score, indicating no difference between current input and the reference sample.
7. To help the graduality of the search and guide it, apply a gaussian blur to all sprectral images before comparison. The blur radius decreases as the score improves. This will speed up the search dramatically.
8. Repeatedly search and find the score, and keep a top 10 list both by top score, and diversity (distance of top score results from each other, parametrically).
Also, have an internal delay parameter to be implemented to adjust when incoming audio is processed to include lag time (earliness/lateness) as a parameter.
Record the same length of time as the reference sample so the comparision images have the same dimensions.
This can also be used to make new synths, but by searching code variations instead of cc parameter variations. You would need to provide many samples though, achievable by regression similar to Gepsoft GeneXproTools.
1. Create an effect plugin, which can process incoming audio and send midi notes and cc data.
2. The effect can load a sample, which is then converted into 2 high resolution spectral images through the fourier transform (frequency is y, time is x). One image for phase, one for amplitude.
3. The effect will send one midi note, and record incoming audio. This audio is also converted into phase and amplitude spectral images via fourier transform.
4. The effect will send many midi cc parameters outwardly before the midi note is sent, along with pitch bend, to adjust all available parameters of connected synths and effects.
5. Use a standard nested hill climbing search algorithm, or genetic algorithm, to send one note at a time along with cc midi values, and record the result.
6. During the search, after each note is sent along with cc values, and the new audio is converted into an image... Directly compare the loaded reference sample spectral images, against the new audio spectral images, by subtracting each 'pixel'/bin-value, take the absolute value, and add it all up. (Adding up the difference.) The closer the result is to zero, the closer they match. Therefore, zero, is a perfect score, indicating no difference between current input and the reference sample.
7. To help the graduality of the search and guide it, apply a gaussian blur to all sprectral images before comparison. The blur radius decreases as the score improves. This will speed up the search dramatically.
8. Repeatedly search and find the score, and keep a top 10 list both by top score, and diversity (distance of top score results from each other, parametrically).
Also, have an internal delay parameter to be implemented to adjust when incoming audio is processed to include lag time (earliness/lateness) as a parameter.
Record the same length of time as the reference sample so the comparision images have the same dimensions.
This can also be used to make new synths, but by searching code variations instead of cc parameter variations. You would need to provide many samples though, achievable by regression similar to Gepsoft GeneXproTools.
SLH - Yes, I am a woman, deal with it.
- KVRist
- Topic Starter
- 66 posts since 18 Jan, 2017
-
- KVRian
- 1000 posts since 1 Dec, 2004
- KVRist
- Topic Starter
- 66 posts since 18 Jan, 2017
Right! I had forgotten about that one. Thanks!MadBrain wrote: ↑Thu Jul 11, 2019 8:09 pm That's more or less the idea of the Hartmann Neuron
https://www.youtube.com/watch?v=8-GrZ6NSM0E
-
- KVRAF
- 7540 posts since 7 Aug, 2003 from San Francisco Bay Area
Well, it isn’t really all that difficult to decide whether a given sound sounds like a trumpet using existing Generative Adversarial Network methods. I doubt it would come up with meaningful parameters to tweak the resulting sounds, however. We’re mostly talking convolution kernels here. Anyway, I think additive resynthesis is the way to go here.Richard_Synapse wrote: ↑Wed Jul 10, 2019 4:44 pm It should work for simple synths already with current machine learning algorithms, for more complex synths actual intelligence is probably needed to program such presets.
Also you'll need a way for the computer to decide whether a given preset sounds like e.g. a trumpet or not. This is probably the easier task to solve, but not exactly trivial, either.
Richard
Incomplete list of my gear: 1/8" audio input jack.
- KVRist
- 61 posts since 19 Nov, 2012 from Stockholm, Sweden
Interesting question and something I have been thinking about too occasionally. Not with the goal of emulating an existing sound, but more as way of simplifying the interface of a complex synthesizer into a smaller number of more meaningful and expressive parameters.
Vertions suggestion is a good start. Though I have a feeling that just looking at the fourier transform (spectrum and phase) is not going to be the best way of classifying audio. Audio can have quite different spectral images and still sound similar and vice versa. My guess is that you would need to do the classification with features that more closely match how humans hear and identify sounds.
Also, if you take, say 5 second audio clips in every iteration, the static Fourier transform of the clip will not tell you anything about how the timbre evolves during those 5 seconds (filter sweeps and harmonically rich attacks for instance). So you need to use features that take that into account as well.
Vertions suggestion is a good start. Though I have a feeling that just looking at the fourier transform (spectrum and phase) is not going to be the best way of classifying audio. Audio can have quite different spectral images and still sound similar and vice versa. My guess is that you would need to do the classification with features that more closely match how humans hear and identify sounds.
Also, if you take, say 5 second audio clips in every iteration, the static Fourier transform of the clip will not tell you anything about how the timbre evolves during those 5 seconds (filter sweeps and harmonically rich attacks for instance). So you need to use features that take that into account as well.
- KVRist
- Topic Starter
- 66 posts since 18 Jan, 2017
An advanced synth with simple controls sounds great to me, although I can understand those who want full control over their programming too. For those who don't mind, a rating system combined with an evolution/randomiser might yield interesting results. So, you choose which sounds you like and it creates iterations of that sound, which you then choose between again, and so on, until you are satisfied. An easy way to create presets which are unique to you.
Obviously the problem with "expressiveness" remains, but you could have the same process with the performance aspect as you do with the timbre. You could even split the two, so you have performance macros made for string like instruments, blown, bowed, etc. It wouldn't take long before you have an MPE instrument which is personalised to your unique playing style. At least in principle.
Obviously the problem with "expressiveness" remains, but you could have the same process with the performance aspect as you do with the timbre. You could even split the two, so you have performance macros made for string like instruments, blown, bowed, etc. It wouldn't take long before you have an MPE instrument which is personalised to your unique playing style. At least in principle.
-
- Banned
- 12368 posts since 30 Apr, 2002 from i might peeramid
machine learning works by data training. definately available patches for some vst number in sufficient quantities.
one of the interesting things is that the solution becomes parametric revealing patterning that may not be obviated otherwise.
not my thing tho, sorting thru shit wears me down personally. i prefer proceduralism as an end rather than a means, it can reveal so much more about existence than human expression.
one of the interesting things is that the solution becomes parametric revealing patterning that may not be obviated otherwise.
not my thing tho, sorting thru shit wears me down personally. i prefer proceduralism as an end rather than a means, it can reveal so much more about existence than human expression.
you come and go, you come and go. amitabha neither a follower nor a leader be tagore "where roads are made i lose my way" where there is certainty, consideration is absent.