Need custom off-line pitch correction app - for pay

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

The usual observations apply. Cost will be between 5k -50k+. Most so called new ideas just don't make it as the cost prohibits it. A manager cracking the whip saying it can be done, no one has tried this my way before, try harder usually scares people off.

Post

But people have tried this before:-
https://ccrma.stanford.edu/~pdelac/154/m154paper.htm

Post

So you guys are saying this sounds like a great idea, right? :D

You guys are making valid points and I'll admit this may not work. But there a few things I want to mention:

First, the result might not sound good, but even then, it won't sound like an accordion, because I'm not looping a single cycle. All the cycles still get played in succession, so the evolving tonal variations will still be there, at least as much as with any other tuning software. Again, I concede my method might have unacceptable artifacts, but sounding like an accordion won't be one of them.

Also, pitch detection is not an issue here, since that's input manually by the user, not the app. Figuring out a reliable algorithm for finding the cycle start points is an issue, but figuring out the cycle length (a function of pitch) is not.

As far as this being a recipe to "suck all the life out" of a vocal, I agree with that. However, my reason for wanting to try this is that I believe it will suck less life out of a vocal than traditional auto tuning apps. That's (arrogantly) part to the point. FFT is a wonderful process, but there's an "averaging" involved in it that I'm suspicious of. Don't get me wrong, the vast majority of the time, I think Melodyne sounds fantastic. But when it doesn't, it *really* doesn't.

By the way, I feel I should mention as a disclaimer that I don't Melodyne or Autotune all, or even most of our samples. I like to maintain as much organicness as possible. Most of the time I just tune in the Mapping Editor. Just saying', in case any competitors or critics are reading this.

One other thing I want to mention is I'm not so sure this method has been tried before. Don't get me wrong, undoubtedly the big players have already thought of and discarded it. But I don't think they discarded it because it doesn't work.

I think they discarded it because all auto-tuning apps need to maintain timing (which my idea does not.) It's a critical requirement. After all, who wants an AutoTune app if notes become longer or shorter and no longer in sync with the rest of the music? Maintaining the timing is a top priority, and easy as pie (albeit a very complicated pie) using FFT methods. So a method like mine, which does *not* lend itself to maintaining phrase timing, probably got discarded right off the bat.

Again, I'm not saying my method won't be a disaster, and I really do appreciate everyone's thoughts. (Please do keep them coming!)

But I'm willing to pay to find out. Really all I need is an app that can read a wave file, then run the algorithm I posted yesterday to find the cycles, then add some interpolation method for stretching (or compressing) cycles, then write the result to a new wave file. Heck, to keep things easy for testing purposes, we can even limit audio files to A440 and use a 192k sample rate.

Post

A couple of thoughts.
I understand what you want, but if you're doing everything in the time domain, ie not using spectral analysis for anything, then what happens when you change the period length?
If it's longer then how do you fill in the gap? I find it makes a very distinctive sound with these spaces.
If it's shorter then information will be missing from the sound. The wave of the first period won't join up the second one if it's moved up a little, producing discontinuities, will this sound natural or not? You might get away with it.
Well, it may be OK for small pitch changers, but you may still lose something natural.

Manual period marking? *eeek* errm, OK. Perhaps in the very early stages you can use free software for grabbing the period pulses, something like 'Praat' http://www.fon.hum.uva.nl/praat/ will grab approximations for you to test the sound early on in your project.

Converting the audio to 44.1Khz or even 22Khz will still give you the salient features you need in vocals, and will be much easier to handle and faster to process if you're doing any detection processing. The references gained can then be applied to the larger rates.

Post

Some fairly old papers on pitch detection reported experiments where the program tries to determine pitch by many methods-- zero crossings, peaks, auto correlation, possibly other methods, and then the authors attempted to write code that would arbitrate between all the different answers and try to pick the best one.

I don't know if vocal would be easier or more difficult to micro-tune in time domain, compared to other instrments. BTW, time domain is dealing with the patterns of samples in time sequence (trying to find patterns with the computer, as you do with your eye looking at the wave display). Versus frequency domain, dealing with patterns of frequencies, derived from the signal via fourier transform or other transforms.

In crude pitch detection experiments I've done over the years, I made conclusion that different instruments are better-handled by different techniques. For instance algorithm A might work better on guitar, but algorithm B might work better on flute. If vocal is the ONLY source you wish to work on, then it would at least somewhat simplify the problem, because when you get it as good as it gets on vocal, possibly it will suck for guitar.

A severe problem with some instruments, which may apply to some vocals as well, is that harmonics can be out of tune with each other. If the 1st harmonic and second harmonics are about the same level, and they are out of tune, then which one do you fix? Fix the 1st harmonic and possibly make the 2nd harmonic even more out of tune, or fix the 2nd harmonic at possible detriment to the first? Or straddle the fence between them?

The dynamic changing nature of harmonic mix and harmonic tuning across cycles, is one (IMO large) reason you can't just autotune at peaks or zero-crossings. As the harmonics slide against each other, the distance between peaks or zero-crossings is constantly changing, even in a relatively static natural sound.

It may be possible to micro-tune, and not having to force the duration may make it simpler, but the final algorithm will probably be non-obvious, or possibly a stack of patches on top of patches, as each new exception is found and dealt with. :)

What do you intend re vibrato? Is all vibrato to be killed? If not, how do you decide how to micro tune the pitch while not mutilating the vibrato?

Am just trying to help, not trying to be critical of the effort.

Post

quikquak wrote:. . . if you're doing everything in the time domain, ie not using spectral analysis for anything, then what happens when you change the period length?
If it's longer then how do you fill in the gap? I find it makes a very distinctive sound with these spaces.
If it's shorter then information will be missing from the sound.
This would be interpolated. For example, lets suppose we have a wave with these values for a single cycle:
(0, 40, 80, 90, 75, 35, -5, -45, -80, -90, -80, -40)

That's a cycle of 12 samples, but suppose to have it in tune, we need to stretch it to 15 samples. Rather than simply duplicating 3 of the samples like this (which would result in the "spaces" you described):
(0, 0, 40, 80, 90, 75, 75, 35, -5, -45, -45, -80, -90, -80, -40)

. . . the algorithm would do interpolations and all the samples would have different values like this:
(0, 30, 65, 82, 90, 78, 48, 28, -7, -39, -55, -77, -90, -82, -41)

Post

JCJR wrote:In crude pitch detection experiments I've done over the years, I made conclusion that different instruments are better-handled by different techniques. For instance algorithm A might work better on guitar, but algorithm B might work better on flute. If vocal is the ONLY source you wish to work on, then it would at least somewhat simplify the problem, because when you get it as good as it gets on vocal, possibly it will suck for guitar.
That makes sense.
JCJR wrote:A severe problem with some instruments, which may apply to some vocals as well, is that harmonics can be out of tune with each other. If the 1st harmonic and second harmonics are about the same level, and they are out of tune, then which one do you fix? Fix the 1st harmonic and possibly make the 2nd harmonic even more out of tune, or fix the 2nd harmonic at possible detriment to the first? Or straddle the fence between them?

The dynamic changing nature of harmonic mix and harmonic tuning across cycles, is one (IMO large) reason you can't just autotune at peaks or zero-crossings. As the harmonics slide against each other, the distance between peaks or zero-crossings is constantly changing, even in a relatively static natural sound.
Oh! I assumed the harmonics would be in tune with the fundamental. If not, then that certainly introduces new issues, because if a harmonic is out of tune with the fundamental, then I want it it to *stay* out of tune with the fundamental (even as the fundamental gets tuned) because I would want the character of the sound to stay there.

I don't know what the situation is with vocals and their harmonics being in tune with the fundamental, so I'll have to look into this.
JCJR wrote:What do you intend re vibrato? Is all vibrato to be killed? If not, how do you decide how to micro tune the pitch while not mutilating the vibrato?
Vibrato isn't handled at all. These samples are all without vibrato, since the process would be a nightmare if it was there.
JCJR wrote:Am just trying to help, not trying to be critical of the effort.
Absolutely how I interpreted this. I appreciate your thoughts. 8)

Post

I don't have it installed right now so I can't check, but I think CDP (recently gone free) may well be able to do exactly this. Its 'waveset' distortions use the period of a wave spanning 2 zero-crossings as its base unit (i.e. a wavecycle, assuming a sine or other wave which doesn't cross 0 more than twice per period). CDP calls these units wavesets to distinguish them from wavecycles, as a waveset may not necessarily represent the fundamental frequency of the waveform being processed. The waveset distortions can operate on these wavesets in lots of interesting ways, and I'm almost sure that forcing each waveset to a predetermined length is one of the distortions available. Even if not, you can try a few other processes and it'll give you a good idea of the artifacts you're likely to encounter. Looking at speech in a wave editor, I expect you'll get something along the lines of wavetable-like singing during voiced vowel sounds interspersed with disproportionately long, rapidly modulating streams of pitched noise where unvoiced sounds are. Your "S" sounds for instance will be disproportionately lengthened by this process, simply because the unvoiced noise that constitutes an S sound contains much higher frequencies and thus more wavesets per second than your more obviously periodic parts of the signal like the vowel sounds. Could well sound very interesting, but it sure as hell won't sound natural.

I had loads of fun with the waveset transformations on vocals though. A favourite was stitching together about a minute of me intoning "aeuiouaiuoua (etc)" as close to A 220 as possible, deleting every waveset below 218 Hz and above 222 Hz, then 'morphing' from each remaining waveset to the next by generating 16 interpolated wavesets between each. Tons of fun.

Post

JCJR wrote:A severe problem with some instruments, which may apply to some vocals as well, is that harmonics can be out of tune with each other. If the 1st harmonic and second harmonics are about the same level, and they are out of tune, then which one do you fix? Fix the 1st harmonic and possibly make the 2nd harmonic even more out of tune, or fix the 2nd harmonic at possible detriment to the first? Or straddle the fence between them?

The dynamic changing nature of harmonic mix and harmonic tuning across cycles, is one (IMO large) reason you can't just autotune at peaks or zero-crossings. As the harmonics slide against each other, the distance between peaks or zero-crossings is constantly changing, even in a relatively static natural sound.
Oh! I assumed the harmonics would be in tune with the fundamental. If not, then that certainly introduces new issues, because if a harmonic is out of tune with the fundamental, then I want it it to *stay* out of tune with the fundamental (even as the fundamental gets tuned) because I would want the character of the sound to stay there.

I don't know what the situation is with vocals and their harmonics being in tune with the fundamental, so I'll have to look into this.
I don't know if you have ever used one, or maybe you have an old Peterson or Conn strobetuner, or equivalent. I haven't done sampling, tuning samples for many years, but I found the stobotuner very valuable for that purpose, not to mention I tuned so many pianos with it for decades.

The moving strobe wheel, when illuminated by a light flashing from the audio input-- There are many concentric stripes on the wheel, each one representing an octave, in other words harmonics 1, 2, 4, 8, etc.

If a harmonic is flat the bright-dark pattern of dots moves left, if it is sharp the pattern of dots moves right. If in-tune, the pattern stops. Almost always with percussion insts like bass, guitar, piano, the low bars move left and the high bars move right. Sometimes the enharmonicity is slight, sometimes drastic. The enharmonicity is often greater at the beginning of a note and less as the note decays.

For instance, tuning a piano, or sometimes guitar, it is a mistake to tune the fundamentals. If you set all the fundamentals in-tune on a piano it sounds like crap. The ear seems to judge the in-tuneness relative to the midrange, so the low notes are tuned so that their mid-range harmonics agree with the middle notes, and the high notes are tuned so that the fundamentals of the high notes agree with the (sharp) harmonics of the middle notes. So it sounds OK to the ear, but the fundamentals of low notes go progressively flatter, and fundamentals of high notes go progressively sharper. High quality big pianos are less enharmonic than low quality small pianos, so nice pianos don't need as much stretch, and when tuned "as good as it gets" sound more sonorous.

On a strobetuner, the loudest harmonic has best contrast on the dial. Often the ear likes it best if you tune against the brightest visible band, rather than the lowest visible band (tuning a harmonic, even if it throws a fundamental off pitch). When the ear tunes a guitar, the ear tunes to the most-perceived set of harmonics. When a machine measures only the fundamental of a guitar, the fundamentals might be perfectly in tune but the guitar still sounds terrible, because the ear is listening to the combined "tone center" of all the harmonics together, and often the higher harmonics can be significantly sharp to the fundamental, and the different strings have different harmonic tuning spreads.

Anyway, I'll stop blathering. Just long ago when I would tune samples, samples are just as imperfect as real instruments, and the strobe was real good for tuning "the best" harmonic rather than getting the fundamental in tune but having the overall set of samples sound out of tune.

If you look at vocal tones in a wave editor (which you are more expert at than I), you notice that the shape of the waveform changes over time, even if the singer is trying to hold a solid pitch and timbre. The shape is determined by the mix of harmonics, and their relative phase to each other. Even if all the harmonics are real close to in-tune with each other, if the wave shape has changed because the phase relationship between harmonics has shifted-- The phase relationship can't change if all the harmonics are at EXACTLY the same pitch. If a harmonic moves against the fundamental, it has to be slightly out of tune with the fundamental in order to move, because it is "creeping ahead" or "falling behind" a little bit, as the wave shape changes.

====

You will need interpolation between the pitch-adjusted blocks, as well as interpolation within each block. The ear is very sensitive to slope changes in the wave shape. If the slope isn't completely smooth and continuous at each transition between blocks, the ear will hear bad things, though the eye might not see anything much wrong just viewing the wave display.

Post

Cron, thank you for mentioning CDP. I didn’t know about that one, so I’ll give it a try to manually (cycle by cycle) edit a snippet of a file and see if this method holds promise. I do have someone (an experienced guy with music apps under his belt) who has agreed to write this app for me, but before I commit to his time and my expense, it will be a good idea to do a quick test.

JCJR, this is really useful information, most of which (embarrassingly) I did not know. I always assumed harmonics were automatically in tune with their fundamentals. Yikes! This explains a lot. Thank you very much for this information.

You’re right about needing interpolation between the blocks as well.

Post Reply

Return to “DSP and Plugin Development”