Real-time pitched audio-to-midi technology... behind times?

VST, AU, AAX, CLAP, etc. Plugin Virtual Effects Discussion
RELATED
PRODUCTS

Post

OK, so for a while I've been meaning to try and control VSTis with pitched audio signals, particularly human voice (and maybe guitar in the future). Today I finally managed to round up several options in VST form and I must say that I'm very unimpressed. Admittedly I've done my tests in less than optimal conditions on purpose (some ambient noise, a bad mic) to try and separate the decent from the bad outright, but I haven't found a single solution that does a half-decent job in this environment which is underwhelming... The options I've found and demoed so far are:
- knzaudio Midifier
- Numerikart Midikonv
- nuton Musiciendoz
- Sadglad Supereel
- Tazman TheExtractor
- Widi Audio-to-MIDI VST

I don't want to single out any devs as bad cos' i'm sure their efforts have been great and there's probably some technological barrier that I'm missing, but out of the options mentioned above I haven't found a single really usable solution: lots of note misfirings, lots of pitch errors, not a single decent pitchbend/glide approach, and finally bugs in some of the plugins... in short, dynamics and triggering events are usually captured pretty decently, but pitch-to-midi appears to be still a black magic art (?).
I've also tried FrettedSynth's audio-controlled synths and the experience is waaay better, but I think that they just bypass midi altogether... so I assume that the bottleneck is on the midi specification?

So a couple of honest questions:
- Am I missing something? I mean, are there any options out there considerably better than the ones I mentioned above?
- Why? I mean, what's the technological barrier that doesn't allow me to control ANY synth/sampler using my voice, in 2007?


Of course this is not intended as a rant at all, I sincerely appreciate the efforts by the devs mentioned above, I'm just curious... is it just that the MIDI specification is poor for this purpose, or maybe there's not that much public interest to merit deeper research on the subject, or perhaps there's yet another reason that I'm completely missing... any thoughts are welcome :)
The mind boggles.

Post

Pitch detection on monophonic material should work.

Maybe it's your voice... Some vowels work better than others. Have you tried how it performs if you feed it with less upper harmonics and more of the basic frequency? That can be done by singing "dooo" (ooo like pronouced in moon)
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post

Well I hadn't thought of that... basically I've tried all vowels and a lot of articulations but it might very well be my voice that is not up to the task... I'm no singer you know ;) Yet for basic monophonic work I'd thought it would be easier to set up... of course if I very carefully articulate just one note after another with pauses in between and nothing more, the notes sometimes get identified correctly... but not consistently...

Anyway, I have to bring the plugs to the studio and try there with isolation and decent mics... yet I feel that for live work this is out of the question for now...
The mind boggles.

Post

Your going to get pretty much the same outcome with a crappy mic or a good one IMO. I've tried everything from using high end recordings to breaking down audio into simple fundementals (from notes) and still gotten incorrect detection. A surprising thing (for WIDI anyway) is that short bursts of notes instead of long parts seem to be detected better for some reason.

L
Image

Post

Well, imho from the options mentioned widi is the most promising one indeed, but still a long way from being really usable, at least for me...
So there's really nothing better for real-time? Not even stand-alone and/or more expensive stuff? Damn, I look around, see how far we've gotten in other areas (i.e. Nebula, VB3/Miles'tone, etc) and feel like banging my head against the wall :bang: :hihi:
The mind boggles.

Post

Melodyne and MAGIX Elastic Audio are very good in this field, IMO. I bought recently MAGIX Music Maker 12 mainly for the Elastc Audio function. It can help the transcribing work pretty well. But I don't need perfect detection, neither real-time processing. I have a brain and I can correct the faults of the detection. :P

Post

Cool, you just reminded me that I have a G50 midi guitar converter, and that thing can use a mono audio channel too. So I pluged a mic into it and the result is ... amazing : http://homepage.hispeed.ch/mbncp/images ... ditest.mp3, too bad it didn't catch the lyrics :hihi:

Post

Lance wrote:Melodyne and MAGIX Elastic Audio are very good in this field, IMO. I bought recently MAGIX Music Maker 12 mainly for the Elastc Audio function. It can help the transcribing work pretty well. But I don't need perfect detection, neither real-time processing. I have a brain and I can correct the faults of the detection. :P
Well yeah, I also have a brain and I know about melodyne but afaik it doesn't do real-time which is the first word of the thread title and the whole point of my request :P No really, I'm looking into audio-to-midi not as a surgical off-line correction/transcribing tool but as a broad real-time creative/performance tool... I assume that elastic audio falls in the former category as well but I'll check it out anyway, thanks :)
The mind boggles.

Post

mbncp wrote:Cool, you just reminded me that I have a G50 midi guitar converter, and that thing can use a mono audio channel too. So I pluged a mic into it and the result is ... amazing : http://homepage.hispeed.ch/mbncp/images ... ditest.mp3, too bad it didn't catch the lyrics :hihi:
mmm hardware... very interesting... the clip sounds grrreat indeed, now if you could post the original voice clip to compare... :hihi:
The mind boggles.

Post

I didn't record the original take, the G50 isn't even connected to my soundcard.

I tried to experiment a little more, and I guess I have more or less the same problem than you have and probably more as I'm not singing very often and smoke far too much.

If you want I can try some of your stuff to see if it's doing any better or let me know with which system you had the better results (demo/freeware) and I could give it a try.

Also some MIDI processing could probably help.

Post

Yea hardware seems to work better for this.. and by the way Melodyne is Monophonic but does not have realtime converstion.. When you say realtime do you mean non latent? Get WIDI vst and Sp3ctr3. Run the MIC thru (MONO) your input of host thru Sp3ctr3 1st, turn up the threshold just a touch and use the HPF and LPF to narrow down accordingly. Then setup WIDI vsti and record. It should work 80% better! This works perfect for me!

HTH

L
Image

Post

Lagrange wrote:When you say realtime do you mean non latent?
There's always gonna be some kind of latency due to the A/D/A... I say realtime as in "performing" i.e. in the same sense that you can plug a guitar through an amp/cabinet software emulation and play "realtime"... basically without temporary audio or midi files involved.
Lagrange wrote:Get WIDI vst and Sp3ctr3. Run the MIC thru (MONO) your input of host thru Sp3ctr3 1st, turn up the threshold just a touch and use the HPF and LPF to narrow down accordingly. Then setup WIDI vsti and record. It should work 80% better! This works perfect for me!
mmm, that's interesting advice... I tried briefly and removing some of the unwanted spectral content of the voice seems to help indeed... gotta experiment with this, thanks! 8)
The mind boggles.

Post

Yes unconventional, whatever works right?

L
Image

Post

Well, I did further testing on a somewhat more controlled environment and concluded that:

- From the options I've tried, I like the feature set of TheExtractor (which sadly seems abandonware) but I got the "best" results out of Widi

- The spectral narrowing certainly helps (nice trick!). It can be a bitch to set up for a specific voice/environment but when you hit the sweet spot the results are definitely better...

- ... but still not consistent enough (at least for me) :? I'll keep investigating though...

- On the other hand, the frettedsynth audio-controlled synths are fabulous, veeery responsive and great fun to use... their response is exactly what I had expected from a decent audio-to-midi implementation 8) So for the moment I'll just stick with them, and perhaps I'll ask the guy if his pitch/amplitude-tracking can be adapted to an audio-to-midi approach (although I vaguely recall reading that it's not so easy).
The mind boggles.

Post

Thanks Juanjo

GCBS4 is the newest from my audio triggered synths, all the rest (string sculptor, tripp lead and phazosc) are in need of an update.
Hope to finish soon, then building some new stuff with the new Chris Kerry osc's, :-) that should be some fun synths??

Peace
Fretted Synth

Post Reply

Return to “Effects”