Physical modeling is dead! Google tone transfer is here ...
-
- KVRAF
- 2514 posts since 28 Sep, 2012
Thanks for clarifying.
Again, it’s not my line of thinking per se. It’s the Google teams. I posted a quote from the project blog on the first page I believe.
As far as my musicanship: an out of practice classically trained pianist.
For sure, the system has some kinks. But I don’t see this as being a replacement for current and established ways of synthesizing sounds. Rather, it’s more like a proof of concept for being able to take one sound and change it into another. It’s real novelty and impact may be more interesting in the DSP and machine learning realm (hence the Google teams idea that the system can do things that a human wouldn’t be able to do), especially in its current state.
As far as it’s practicality, I could see a future where film composers find it very useful, for example, a composer whose primary instrument is a violin (or maybe they recorded some birdsong) could record his idea for the theme or cue, and then “tone match” quickly to another instrument of sound rather than recreate it using synths and sampler, and present to the director or lead composer. Once the theme or cue is approved, then the theme or cue could be fleshed out using real instruments and musicianship. At least that was my experience when I worked as an assistant to a composer (who was one of many competing with each other under a very famous composer): ideas needed to be sketched out and evaluated quickly. Absolutely in its current state, it’s not read for prime time. But if they improve their models (and if other areas of deep learning are anything to go by, they likely will) it will get better.
Again, it’s not my line of thinking per se. It’s the Google teams. I posted a quote from the project blog on the first page I believe.
As far as my musicanship: an out of practice classically trained pianist.
For sure, the system has some kinks. But I don’t see this as being a replacement for current and established ways of synthesizing sounds. Rather, it’s more like a proof of concept for being able to take one sound and change it into another. It’s real novelty and impact may be more interesting in the DSP and machine learning realm (hence the Google teams idea that the system can do things that a human wouldn’t be able to do), especially in its current state.
As far as it’s practicality, I could see a future where film composers find it very useful, for example, a composer whose primary instrument is a violin (or maybe they recorded some birdsong) could record his idea for the theme or cue, and then “tone match” quickly to another instrument of sound rather than recreate it using synths and sampler, and present to the director or lead composer. Once the theme or cue is approved, then the theme or cue could be fleshed out using real instruments and musicianship. At least that was my experience when I worked as an assistant to a composer (who was one of many competing with each other under a very famous composer): ideas needed to be sketched out and evaluated quickly. Absolutely in its current state, it’s not read for prime time. But if they improve their models (and if other areas of deep learning are anything to go by, they likely will) it will get better.
- KVRAF
- 8071 posts since 9 Jan, 2003 from Saint Louis MO
KvR users: "there's not enough innovation in synthesis!"
Also KvR users: "why the hell would anyone want new technology?"
Also KvR users: "why the hell would anyone want new technology?"
- KVRAF
- 11162 posts since 16 Mar, 2003 from Porto - Portugal
Over-generalization fallacy is a common flawfoosnark wrote: Thu Oct 22, 2020 4:00 pm KvR users: "there's not enough innovation in synthesis!"
Also KvR users: "why the hell would anyone want new technology?"
Fernando (FMR)
-
- KVRAF
- 4329 posts since 26 Jun, 2004
- KVRAF
- 8071 posts since 9 Jan, 2003 from Saint Louis MO
So is not having a sense of humor.
-
- Banned
- 50 posts since 21 Oct, 2020
Well said! Thank you.imrae wrote: Thu Oct 22, 2020 10:41 am I feel that it slightly misses the point of instrumental performance to map one instrument to another. Different instruments provide different kinds of expression; the way I perform vibrato on a guitar is not the same as the way I would choose to use vibrato on a trombone. There is no equivalent to hammer-on/pull-off articulations, and the concept of "tremolo" is entirely different. How can bow-pressure be inferred from whistling?
My suspicion is that such transfer technology, like most ROMplers, does not really imitate the whole scope of an instrument, but only a particular limited way of playing the instrument. Of course, like ROMpling, this would be sufficient for many commercial purposes.
-
- Banned
- 50 posts since 21 Oct, 2020
Nicely articulated! Thank you.Actually, what usually fails when trying to mimic an acoustic instrument is the real time variations of pitch, timbre, volume, attack articulations, phraseology, etc. that a violinist performs in real time. No Ai will be able to do that because that changes from note to note, from piece to piece, etc. And the examples in the site are something to laugh about. They don't sound at all like a real instrument. They "resemble" the real thing, but much more distant than any good sample library can.
All the rest you wrote is plain mumbo jumbo. No, it's not like you say, at all. I can synthesize a realistic violin sound, or sax sound. If I use a sampler, I can get a VERY realistic sound of a violin or a sax. Problem is, it plays realistically only if I respect all the idiosyncrasies of THAT instrument. Otherwise, it will sound synthetic and artificial (as the examples in the site did).
- Rad Grandad
- 38041 posts since 6 Sep, 2003 from Downeast Maine
I fixed your quote Wes 
The highest form of knowledge is empathy, for it requires us to suspend our egos and live in another's world. It requires profound, purpose‐larger‐than‐the‐self kind of understanding.
-
- Banned
- 50 posts since 21 Oct, 2020
Thanks!
- addled muppet weed
- 111242 posts since 26 Jan, 2003 from through the looking glass
physical modelling is dead!
long live physical modelling!
long live physical modelling!
-
- KVRAF
- 3086 posts since 4 May, 2012
The task would be beyond daunting if you were to approach it as a human using this method - Also noting that a human has the advantage of a complex understanding pitch before he approaches the synth. This isn't tuning an additive bank of sine waves via FFT; this is tuning each partial/oscillator by trial and error untill best guesses are arrived at - whilst also tuning every other aspect of the synth and calculating how to play it so it best matches the input data.fmr wrote: Thu Oct 22, 2020 3:11 pmThat's where your line of thinking fails. Synthesizing the sound isn't "that" difficult. Actually it was done in the sixties by Jean-Claude Risset using acoustic analysis and additive synthesis via the Music language created by Max Mathews. It has nothing to do with human limitations, since we can program each element at a time, and then trigger most of them using relatively simple controls.perpetual3 wrote: Thu Oct 22, 2020 2:52 pm .../... if the sound is complex enough, eventually it becomes really difficult, for nearly everyone, to synthesize a physical sound using elementary modules or DSP elements because humans have limitations (two eyes, two arms, ten fingers, difficulty paying attention to more than one thing at once, etc).
This is why we have sustained sounds which seem to be made up of scraps of audio. The machine hasn't finished learning but that is the best match at that point.
Perpetual3 does seem to understand the process well.
If we take Gaussian distribution: We could start with a cluster of sound - seemingly random - but as time progresses, the sound converges on a single pitch and floats around that. The difference is that whilst an orchestra generally understand their instruments and have a reasonable grasp as to the starting and end point, the machine only knows a reality of what previously worked and what didn't.
As a slightly meta example: Consider how the Fourier Transform uses correlation: Except, imagine the machine having no algorithm. So it prints each sample value by sample value, making comparisons to the original audio, keeping the best matches and scrapping the worst. It would be a much slower process in human terms. However, at some point, we might find that the machine learns how to generate and match sine waves. Within that data there will be some understanding. Just as with Google's audio engine, the machine will work out tones and articulations that sound realistic.
Consider Google's offering of proof of concept - and it sounds damn good to me. Very impressive work with plenty of possible audio applications for both generating and processing.
- addled muppet weed
- 111242 posts since 26 Jan, 2003 from through the looking glass
nm.
tired
tired
-
- KVRAF
- 3086 posts since 4 May, 2012
Aw. I just quoted what you wrote previously and was going to reply. It turned into this.
But also imagine the synth having all this data for natural variation within it. Every key press could sound varied in a natural manner. Eventually we could sample playing styles. Then mix and match.
Instant obvious application for turning one instrument into another would be only ever needing one crappy guitar that can sound like any other guitar. Maybe never even tune the thing. Have a parameter for playing style being slightly less or more drunk. Adjust the amount of THC, etc.
-
- KVRian
- Topic Starter
- 671 posts since 8 Jan, 2005 from Germany
And what about the other way around? Turning instruments into farts? This could be the next big thing! I'm imagining new genres like fart-step or fart-coreSneakyBeats wrote: Thu Oct 22, 2020 2:22 pm Nice idea.. But I just can't figure out any situation where I'd want to turn my fart into a bunch of violins.



