Imitone -- wow! Most embarassing post I ever started

VST, AU, AAX, etc. plug-in Virtual Instruments discussion
Ah_Dziz
KVRAF
2808 posts since 2 Jul, 2005

Post Thu Feb 13, 2020 11:30 am

Interactopia, I am aware of the issues with simple pitch tracking (I can code a simple pitch tracker with a couple DSP analyzing stages that usually has those issues). The thing is, I’m not as concerned with the technology that was around when you started this project. Plenty of people (actual professional coders) have gotten past these problems and you don’t get octave jumps or latching to higher harmonics in almost any of the new stuff. So it’s not like it is still a technical issue to extract the exact pitch and level of an input signal with as little as one cycle of the fundamental frequency for latency. The rest of the issue is transforming that into MIDI data. I can do that for monophonic data easily and I just code for fun.
Anyway, your plugin was a good idea. It has just been ridiculously slow and still isn’t a “finished product” in the way that almost anything I would buy would be. I’ve found other ways of acheiving the same thing that are much simpler to use in the meantime. I’ll still grab new versions as they pop up and maybe you’ll get it sorted out eventually. Hopefully so.
Don't F**K with Mr. Zero.

User avatar
interactopia
KVRist
33 posts since 14 May, 2014

Re: Imitone -- wow! Best live voice-to-midi I've ever seen

Post Thu Feb 13, 2020 7:28 pm

The tricky thing is achieving all of it at the same time, and under potentially adverse conditions: Low latency, accurate pitch readings, reasonable transcription... in reverberant environments, with background noise, breath noise, handling noise, low-grade microphones, automatic gain control or noise suppression, lyrical singing and potentially inexperienced voices.
Ah_Dziz wrote:
Thu Feb 13, 2020 11:30 am
So it’s not like it is still a technical issue to extract the exact pitch and level of an input signal with as little as one cycle of the fundamental frequency for latency.
I'd be interested to know what sort of algorithm you're talking about! As a general rule, recognizing pitch after less than one period requires foreknowledge of the waveform and is highly sensitive to any kind of interference. It doesn't generalize well to highly-dynamic timbres like human voice, and reliable octave resolution at these timescales is very challenging unless you're working with an extremely predictable sound (like a guitar pickup).
Ah_Dziz wrote:
Thu Feb 13, 2020 11:30 am
The rest of the issue is transforming that into MIDI data.
We talked about this a little in our post last January. The transcription is the most important area for improvement. Our pitch tracking was already some of the best available, and particularly when using our exact pitch mode, there wasn't much to complain about. But I chose to spend another 18 months researching... Pitch tracking.

While this work is leading to measurable improvements in latency, accuracy, staccato tracking and CPU load, the most important aspect of the new tech is that it produces a probability distribution over pitch. Because there's so much uncertainty in the first milliseconds after a tone is registered, that distribution is crucial for achieving a high-quality transcription while keeping latency low. Probability modeling will do a much better job than simpler mechanisms like dynamic thresholding.
imitone: transform your voice into any instrument.

Ah_Dziz
KVRAF
2808 posts since 2 Jul, 2005

Re: Imitone -- wow! Best live voice-to-midi I've ever seen

Post Sat Feb 15, 2020 2:44 pm

interactopia wrote:
Thu Feb 13, 2020 7:28 pm
The tricky thing is achieving all of it at the same time, and under potentially adverse conditions: Low latency, accurate pitch readings, reasonable transcription... in reverberant environments, with background noise, breath noise, handling noise, low-grade microphones, automatic gain control or noise suppression, lyrical singing and potentially inexperienced voices.
Ah_Dziz wrote:
Thu Feb 13, 2020 11:30 am
So it’s not like it is still a technical issue to extract the exact pitch and level of an input signal with as little as one cycle of the fundamental frequency for latency.
I'd be interested to know what sort of algorithm you're talking about! As a general rule, recognizing pitch after less than one period requires foreknowledge of the waveform and is highly sensitive to any kind of interference. It doesn't generalize well to highly-dynamic timbres like human voice, and reliable octave resolution at these timescales is very challenging unless you're working with an extremely predictable sound (like a guitar pickup).
Ah_Dziz wrote:
Thu Feb 13, 2020 11:30 am
The rest of the issue is transforming that into MIDI data.
We talked about this a little in our post last January. The transcription is the most important area for improvement. Our pitch tracking was already some of the best available, and particularly when using our exact pitch mode, there wasn't much to complain about. But I chose to spend another 18 months researching... Pitch tracking.

While this work is leading to measurable improvements in latency, accuracy, staccato tracking and CPU load, the most important aspect of the new tech is that it produces a probability distribution over pitch. Because there's so much uncertainty in the first milliseconds after a tone is registered, that distribution is crucial for achieving a high-quality transcription while keeping latency low. Probability modeling will do a much better job than simpler mechanisms like dynamic thresholding.
The only things I can make on my own use filtering into zero crossing counting then a parallel fft process or maybe two to get better bass resolution to compare the output of the “real-time” stage to. Things like Waves oVox run between 0&128 samples of latency and track as well as anything I’ve ever used including your algorithm.
Don't F**K with Mr. Zero.

Return to “Instruments”