rock solid automatic music transcription, now what?

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

does anybody have advice for someone with a working prototype for an audio to midi program and no idea what to do about it? i built it in reaktor and as of just today i got it to a point where the settings i was always having to fiddle with don't even do much, as there's only one right answer. i'd like to get it out into the world but have no experiences or resources to do that. what are some things to expect or just any kind of advice for someone trying to start out as a smalltime developer preparing a commercial release?
Last edited by kamalmanzukie on Thu Nov 16, 2017 8:23 am, edited 1 time in total.

Post

kamalmanzukie wrote:does anybody have advice for someone with a working prototype for an audio to midi program and no idea what to do about it? i built it in reaktor and as of just today i got it to a point where the settings i was always having to fiddle with don't even do much, as there's only one right answer. i'd like to get it out into the world but have no experiences or resources to do that. what's the next step?
Conventionally, just study some source code as such:
https://pitchtracker.codeplex.com/

However, if you are looking for a top notch idea to get precision, it involves machine learning and/or optimization algorithms. I would recommend a form of regression to directly match the sound output using a variety of scoring functions (aggregated thresholds) from different perspectives (relational operations). This can be accomplished using meta-optimization algorithms (global optima always succeeds with enough time). If you want it to search faster, simultaneously evolve a candidate selection algorithm via evolutionary programming techniques to select the next candidate for scoring, this can speed up search via auto-heuristics. The more access to tone and timbre the regression algorithm has, the more precision it will create. This idea is for high precision and assumed offline processing. It is also possible to approximate this processing in a forward modality with a deep set convolutional neural network and recurrent neural networks for other domains to process in.
SLH - Yes, I am a woman, deal with it.

Post

i dont think theres much more precision to be had. i know that sounds bold, but i've been chipping away at this thing for probably a year and i think recognition is just kind of built in from trying to find as many ways as possible to get more information from the same data. its runs fine in real time also

Post

Hi kamalmanzukie

Is your audio-to-midi polyphonic or monophonic? Successful products might possibly be based on either, though polyphonic might enable a wider gamut of potential products/features.

Long ago I expected that when polyphonic audio to midi became feasible that the inventor might expect "instant riches" but a few programs have been around for a few years now and maybe have been "stealth successes" making the developer a good income, but not famous "ruling the universe" products so far as I could see. Been awhile since thinking about it and at the moment can't recollect names of early ones.

An old unoriginal thought which may still apply-- Re paper notation to midi, or audio to midi accuracy-- For instance an accuracy of 99 percent would at first seem very good indeed. But it means that if we convert 1000 notes of audio, that buried in the 1000 note haystack are 10 mistakes, 10 needles which need to be located and fixed! This could POTENTIALLY be labor-intensive for a fella who has a good ear and is only using the tool to save time of manual transcription. It might take nearly as long to chase down the mistakes as to just listen to the tune and play it in on a keyboard. On the other hand for a customer using the tool because he has not invested time to educate his ear, the poor fella might be up the creek without a paddle because if he can't transcribe the piece by ear then maybe he can't find the mistakes by ear either.

Am not trying to be negative about the topic, just that in order to be truly labor-saving, such a tool would need to be incredibly accurate.

In addition to just selling a program, or licensing a DLL which does the audio to midi, maybe niche products "pretty easy to do once you have real-time pitch recognition" could sell better than the audio to midi program itself?

One minor niche idea, maybe this is already well exploited by somebody-- I suspect that a plugin or stompbox that lets a guitarist "naturally" play into the effect without having to learn an entirely different regimented way of playing to get good tracking-- If he could play guitar in and get GOOD bass tone out, possibly lots of fellas might want to buy that. With many MIDI guitar setups, the fella has to learn to play guitar all over again just to get the MIDI guitar to track good enough to play bass parts with his guitar. If it was as natural as falling off a log, folks might buy.

Rather than DSP to try to map the guitar tone into a bass tone, if you have good enough pitch tracking just drive bass synths. Maybe build-in a bass rompler with a few typical basses and also output midi so the user can use whatever soft or hard synth he wishes.

Post

JCJR wrote:Hi kamalmanzukie

Is your audio-to-midi polyphonic or monophonic? Successful products might possibly be based on either, though polyphonic might enable a wider gamut of potential products/features.

Long ago I expected that when polyphonic audio to midi became feasible that the inventor might expect "instant riches" but a few programs have been around for a few years now and maybe have been "stealth successes" making the developer a good income, but not famous "ruling the universe" products so far as I could see. Been awhile since thinking about it and at the moment can't recollect names of early ones.

An old unoriginal thought which may still apply-- Re paper notation to midi, or audio to midi accuracy-- For instance an accuracy of 99 percent would at first seem very good indeed. But it means that if we convert 1000 notes of audio, that buried in the 1000 note haystack are 10 mistakes, 10 needles which need to be located and fixed! This could POTENTIALLY be labor-intensive for a fella who has a good ear and is only using the tool to save time of manual transcription. It might take nearly as long to chase down the mistakes as to just listen to the tune and play it in on a keyboard. On the other hand for a customer using the tool because he has not invested time to educate his ear, the poor fella might be up the creek without a paddle because if he can't transcribe the piece by ear then maybe he can't find the mistakes by ear either.

Am not trying to be negative about the topic, just that in order to be truly labor-saving, such a tool would need to be incredibly accurate.

In addition to just selling a program, or licensing a DLL which does the audio to midi, maybe niche products "pretty easy to do once you have real-time pitch recognition" could sell better than the audio to midi program itself?

One minor niche idea, maybe this is already well exploited by somebody-- I suspect that a plugin or stompbox that lets a guitarist "naturally" play into the effect without having to learn an entirely different regimented way of playing to get good tracking-- If he could play guitar in and get GOOD bass tone out, possibly lots of fellas might want to buy that. With many MIDI guitar setups, the fella has to learn to play guitar all over again just to get the MIDI guitar to track good enough to play bass parts with his guitar. If it was as natural as falling off a log, folks might buy.

Rather than DSP to try to map the guitar tone into a bass tone, if you have good enough pitch tracking just drive bass synths. Maybe build-in a bass rompler with a few typical basses and also output midi so the user can use whatever soft or hard synth he wishes.


thanks for the reply, this gives me some perspective, which is mainly what i feel clueless about. there's a lot of different hats to wear hats to even think about releasing a product

as far as the algo itself, it is polyphonic, and compared to the guitar midi plugin which i'd say represents cutting edge of whats available, mine has the advantages of not having randomly triggered notes and the notes dont cut in and out. also it almost never is a semitone off. basically when fed through the pizmidichord plugin, it displays the correct chords, even when used on a full mix with drums. i've been obsessed with this sort of thing for a while, and had built two functioning versions before this one, but when after the last experiment it really started working i actually got a fair bit of anxiety, just from thinking it could be marketable but having no idea what do to with that

a disadvantage is i dont think it does quite as well with latency, and i'm not sure how easy getting it there would be.

i'll see if i can get a good representative example up on soundcloud. it'll probably wont be for a few days as i've added new elements to the tracking which always throws the whole thing out of whack, and i'll be leaving to spend thanksgiving with family here shortly

Post Reply

Return to “DSP and Plugin Development”