Implementing pitched sample playback

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Thank you for all the explanations, I really appreciate all of you taking the time to answer my questions!

I had hoped that there was a reasonably straightforward technique I could use to at least improve the sound quality somewhat. But right now it seems that the effort will be way too big, especially since I don't even know if the end result will be significant enough to make it worthwhile. So I think that linear interpolation will simply have to do, at least for the first release of the product. A shame, really! :(

Post

If you do decide to pursue it, and assuming that upsample/interpolate/downsample is a sensible strategy compared to "massively oversampled direct interpolation"--

A possible way to deal with the "standard intermediate samplerate" concept--

For instance if the standard intermediate samplerate is chosen a multiple of 48k-- Maybe 48k * 8 or whatever you measure as "good enough"-- For concrete example lets use 384k samplerate-- If I can avoid senior moments in the following reasoning-- Maybe I'll get it bass ackwards. :)

If the sequencer or sound card happens to be running at 44.1k or a multiple thereof, and you want to play back the 384k source sample with no pitch shift-- You would interpolate the upsampled stream by incrementing the fractional sample position by 48 / 44.1 = 1.088... This would output a stream with fewer samples than the upsampled buffer, accomplishing the shift from 48 base to 44.1 base. The first interpolation is fractional location 0.0, the second is at 1.088..., the third is at 2.177... etc.

OK, now if you wish to output the 384k samprate pitch shifted by Semitone, the Semitone fractional sample position increment would be [(2 ^ (1 / 12)) ^ NumberOfSemitones]. Or if you want to use Cents, calculate as [(2 ^ (1 / 1200)) ^ NumberOfCents]. A shift of +5 semitones would be fractional sample increment of 1.335... And a shift of -7 semitones would be fractional sample increment of 0.667...

So if you want to adjust from 48 base to 44.1 base while also pitch shifting the source sample, merely multiply the factors together. If the 384k stream is destined for a multiple-of-48k target, then a -7 semitone shift would be 0.667... * (48 / 48) = 0.667. But if the stream is destined for a multiple-of-44.1k target, then a -7 semitone shift would be 0.667... * (48 / 44.1) = 0.726...

Therefore, after you calculate the possible 48 to 44.1 adjustment to the fractional sample increment at the "front end", the rest of the process would work the exactly the same, just using a slightly different fractional sample increment.

Post

DNAdisaster wrote:
buddard wrote: Also, another thing I wanted to ask about is this bit from Olli's paper:
A discrete oversampling filter can increase the sampling frequency to an integer N multiple, i.e. oversample by N. Typically, the filter is a FIR filter, because using a FIR one can do "random access" on the data with no extra computational cost – a useful property if N is high, because in such cases typically only a fraction of the samples in the oversampled signal are used.
How does this work? Does it mean that I can actually skip the 0's when I process the oversampled buffer?
I am also curious what is meant by this.
I guess it's referring to polyphase filtering, which in does, in essence, avoid multiplying by zeros you have stuffed in when upsampling, or avoid generating samples you will later discard when downsampling. Or both :)

Post

I found and downloaded the Secret Rabbit Code, and I spent a day adapting my code to use it instead of my own interpolation. The sinc modes added a great deal of CPU usage (although I might have done something wrong with it, I'm not sure), and with no significant gains in sound quality, at least that I could perceive. Like BertKoor said earlier in the thread, it's probably so far off that no interpolation will sound good.

So in the end I decided to stick with linear interpolation. :)

Post

It may not be easy or even possible to sound "pristine" with big speed-up or slow-down. Many sources sound so weird when severely shifted, that it would sound bad even if you could shift it perfectly.

But some codes do it pretty well. So it seems "possible" anyway. There are a few "really good" time/pitch libraries which can be licensed at a price. I don't keep up with it since maybe 2012.

One of them is elastique with which I was most recently familiar, programming using it for several years.

Elastique doesn't perform miracles. If you pitch shift down an octave with elestique then the sound is compromised, but the result can be "better than one might expect". Elastique is licensed by many companies. It is one of the time/pitch methods in reaper and can be tried free via a reaper demo.

Last time I looked, Elastique is somewhat pricey to license. Maybe what they do is simple yet cunning, though I'd be more inclined to wager both complex and cunning.

Just sayin, if you run some test transpositions thru elastique, dirac or whatever other "really good" time/pitch libraries-- The results could be a good basis for comparison.

If something like elastique sounds "X percent of perfect" then it is not likely that you will write code exceeding that "X percent of perfect" except possibly after years of hard work.

On the other hand, it could also demonstrate that "up to X percent of perfect" is at least possible! :)

Post

Have you looked into building a mipmap chain?
You load the sample and precompute successive lower-resolution versions of it, each being scaled down in size by a half, theoretically until you arrive at a single sample. Then at playback time you select the appropriate level of detail based on the playback rate. That way, the filtering and reconstruction does only need to cope with a range of one octave.

Post Reply

Return to “DSP and Plugin Development”