About sampler output buffer oversampling

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

What have you guys found to be a good oversampling ratio for sampler's output buffer to take care of aliasing when playing higher notes?

Is 4x oversampling sufficient for serious purposes or is the best practise to let the user select the oversampling ratio for their own purposes?

Post

Why do you need to oversample the output buffer?

When you take a sample and play it back much faster than the original, what you are effectively doing is increasing the sampling rate, then resampling the resulting signal back to whatever fixed sampling rate you are trying to output. At least assuming a slowly varying playback speed, this is a pure resampling problem and oversampling doesn't necessarily have to be part of the solution at all.

Post

If the higher frequencies in the played back sample go over the nyquist, they bounce back down. Hence the output buffer where the samples are played back would be oversampled. Once all the voices of the sampler have been played there, the output buffer will be downsampled accordingly.

Reading between the lines, I assume you're suggesting that I use Sinc interpolation for the sample interpolation and use suitable sinc function to low pass filter the sample at resampling time? I.e. depending on what pitch needs to be played, change how much the sinc function will lowpass filter the sample at resampling stage?

Post

Kraku wrote: Fri Dec 09, 2022 2:55 pm Reading between the lines, I assume you're suggesting that I use Sinc interpolation for the sample interpolation and use suitable sinc function to low pass filter the sample at resampling time?
There is nothing that you should read between the lines, because I'm not suggesting a particular method, I'm just observing that the aliasing here happens during a resampling (or equivalently interpolation) process that you might not be thinking about as resampling, because you've already chosen the box to think inside of.

Post

I was first thinking of using sinc interpolation with varying kernel length, but then decided against it. This is because now I have an oversampled output buffer + 6 sample 5th order interpolation for the sample itself. The amount of memory scanned and math required per output sample should be less this way.

That being said, I wonder how E-mu e6400 Ultra does its interpolation/resampling. I haven't done direct side by side comparisons, but to me it sounds like the results have much more high frequency content when pitched down an octave or so. Just like the sampler generated some musical sounding high frequencies out of thin air... I guess I really do need to test that machine properly and see if there is some DSP magic going on which I would want to copy.

Post

Here's a link to a fairly scholarly tutorial on the subject:

https://ccrma.stanford.edu/~jos/resample/resample.html

Post

Yep, that looks pretty much what I had in mind with the sinc interpolation having the varying kernel length. I.e. you can adjust the lowpass cutoff dynamically.

The only obvious difference seems to be that the sinc functions in your URL are thought to be the other way around: not having a sinc per each and every input sample (within window) but one sinc at the signal sampling point. Which of course is the same thing mathematically. This in turn solves one of the things I hadn't put any time to think through yet: is there some scaling I would need to perform as there are more input samples inside the window? With this "other way around" thinking it seems to be obvious that there are no other scaling factors needed.

Useful document. Thank you!

But that still leaves me with my original point:

If I used the resampling method described in the link, that is a fair amount of math per resampled sample and it will make the higher notes eat up much more CPU. And eventually you cant make the area of input samples too large, which will result in aliasing at that point. I do not know how quickly that limit makes itself heard in practise, though.

With polynomial interpolation the required amount of input samples and math is quite low, especially when done with SIMD instructions. And the results are practically perfect until the highest frequencies bounce back down from nyquist. So oversampling is a must for this method, which is why I mentioned I use it with this method.

Post

Kraku wrote: Sat Dec 10, 2022 12:03 am Yep, that looks pretty much what I had in mind with the sinc interpolation having the varying kernel length. I.e. you can adjust the lowpass cutoff dynamically.
Can't you just transpose the whole process when pitching up? For every input sample, insert a sinc into the output buffer so that now your cutoff is relative to the output rate. This way you don't need any variable cutoffs or variable filter lengths and even though this kind of scatter is slightly slower than gathering, the difference is actually less than you might think.

Higher CPU usage with higher notes is kinda inevitable though (at least without pre-filtering), but then again even the oversampling approach really only "solves" this issue by making all the notes equally expensive to the highest ones, right?

Post

mystran wrote: Sat Dec 10, 2022 2:24 am
Kraku wrote: Sat Dec 10, 2022 12:03 am Yep, that looks pretty much what I had in mind with the sinc interpolation having the varying kernel length. I.e. you can adjust the lowpass cutoff dynamically.
Can't you just transpose the whole process when pitching up? For every input sample, insert a sinc into the output buffer so that now your cutoff is relative to the output rate. This way you don't need any variable cutoffs or variable filter lengths and even though this kind of scatter is slightly slower than gathering, the difference is actually less than you might think.

Higher CPU usage with higher notes is kinda inevitable though (at least without pre-filtering), but then again even the oversampling approach really only "solves" this issue by making all the notes equally expensive to the highest ones, right?
I probably could. I think I'll have to experiment with all these ideas and see in practise what I think about them.

Post

Kraku wrote: Fri Dec 09, 2022 11:33 am What have you guys found to be a good oversampling ratio for sampler's output buffer to take care of aliasing when playing higher notes?
Not disagreeing with anyone regarding the need to oversample at all.

but FYI, 2x oversampling gives you 1 octave of 'headroom', 4x gives your two octaves, and 8x gives you 3 octaves

So the answer depends on how high you intend to pitch-shift the sample.

Post

Jeff McClintock wrote: Mon Dec 12, 2022 8:13 pm but FYI, 2x oversampling gives you 1 octave of 'headroom', 4x gives your two octaves, and 8x gives you 3 octaves
It's actually even better: you are allowed to produce aliasing at the oversampled rate as long as that aliasing does not enter your final non-oversampled Nyquist interval. So, at 2x oversampling, you are tripling your safe frequency range. If your final target sample rate is 48 kHz but you are running with 2x oversampling (i.e. at 96), you can produce frequencies up to 72 kHz. Yes - sure, the band between 48 and 72 *will* produce aliasing at 96 - but not into the frequency range below 24 but rather into the 24-48 range - which you will throw away (lowpass) later anyway, so you are allowed to fill the 24-48 freq-range with aliasing garbage. And similar considerations apply to higher oversampling factors as well. ...If I'm not mistaken, I think, the general formula for your allowed freq-range is (2*M-1) * sampleRate/2 rather than just M * sampleRate/2 for an oversampling factor of M.
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

I recently asked a question related to this subject here, and it occurs to me there is a distinction needed in this thread as well. The original post was regarding oversampling. But the discussion has also involved resampling. If we make a distinction (which I'm far from sure is proper, though very useful) between the two, there's an important difference in technique.

If oversampling involves increasing the sample rate by some integer factor, as would be used in a distortion process to avoid aliasing, then there's no interpolation involved (as I learned in my earlier thread here). The new samples are just zeroed out and the filtering takes care of supplying the interpolated values.

Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.

Post

dmbaer wrote: Tue Dec 13, 2022 10:22 pm Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.
Interpolation and resampling are really just two ways to look at the same thing.

In case of upsampling by an integer factor, zero-stuffing followed by brickwall filtering gives exactly the same result as direct sinc-interpolation. Even in the integer case, a polyphase FIR (which readily generalizes to arbitrary sinc-interpolation) is more efficient than explicitly zero-stuffing first. There is one case where zero-stuffing explicitly makes sense and that's when you want to use an IIR filter that you can't decompose into a polyphase version as the brickwall lowpass.. but with FIRs it just never makes any practical sense except as a learning tool.

Even if we think of something like linear interpolation, this can be understood as reconstructing a continuous time signal where each of the original samples becomes a scaled dirac-delta, followed by filtering with a triangular kernel, followed by sampling at the desired time-offsets. So mathematically linear interpolation is really a case of resampling with a triangular filter, even though this is not how we would typically implement it. If you are going to use a brickwall kernel for higher quality, then linear interpolation followed by brickwall filtering would be the same as resampling with a brickwall convolved with a triangle, which is much worse than the brickwall on it's own; hence the conceptual zero-stuffing. Other methods of interpolation like Catmull-Rom or B-splines can also be described as filter kernels.

So there really is nothing fundamentally different about any interpolation, resampling or oversampling. There are different methods to implement these things that might be more efficient depending on what is required, but mathematically it's always the same thing and the only thing that varies is the filter kernel used.

Post

mystran wrote: Tue Dec 13, 2022 11:42 pm ... There is one case where zero-stuffing explicitly makes sense and that's when you want to use an IIR filter that you can't decompose into a polyphase version as the brickwall lowpass.. but with FIRs it just never makes any practical sense except as a learning tool.
Well now I'm quite confused. The earlier thread I've been referring to is here:

viewtopic.php?p=8441127&hilit=dmbaer#p8441127

I thought that concluded: for oversampling (integer multiplier on sampling rate), that zero-stuffing was the agreed-upon approach. Indeed, earlevel's blog reference (https://www.earlevel.com/main/2007/07/0 ... onversion/) in that thread states:

That’s it—to double the sample rate, we insert a zero between each sample, and low-pass filter to clear the extended part of the audio band. Any low-pass filter will do, as long as you pick one steep enough to get the job done, removing the aliased copy without removing much of the existing signal band. Most often, a linear phase FIR filter is used—performance is good at the relatively high cut-off frequency, phase is maintained, and we have good control over its characteristics.

So, earlevel says zero-stuff with FIR and mystran says that makes no sense. What are we mere students seeking enlightenment to think? :?

Post

dmbaer wrote: Wed Dec 14, 2022 9:09 pm So, earlevel says zero-stuff with FIR and mystran says that makes no sense. What are we mere students seeking enlightenment to think? :?
It makes sense mathematically, but it doesn't usually make sense in terms of efficiency.

To understand what is going on, think about the FIR filtering process after zero-stuffing to double the sampling rate. To compute one FIR output sample, you multiply each FIR tap with one of the input samples and add the results together. Since we just zero-stuffed the signal, every other input sample is known to be zero, so the multiplication by the FIR tap is also zero and we could just as well skip these computations, because we know which samples are zero and which are non-zero.

This leads to a concept known as polyphase filtering. Rather than zero-stuffing, in the 2x case we split the FIR into two "branches" where one takes all the odd numbered taps and one takes all the even numbered taps. Then we we can filter the input with each of these shorter FIRs and interleave the results. This gives us the same result as if we'd zero-stuffed and filtered with the original FIR, but we compute half the multiply adds.

This polyphase concept can be applied to any oversampling factor, not just 2x case. For 3x you'd split into 3 branches and for 4x you'd split into 4 branches and so on. But we can go further: for infinite factor you'd split into infinite branches and now we essentially have a continuous signal. We can't really design a FIR for infinite oversampling, but we can design a FIR for some "high" oversampling factor (say 256x) and then apply linear interpolation to those branches to pretend we have infinitely many.

Obviously we can't actually compute infinite oversampling either, but suppose we want some "weird" ratio, like 3.14159? What we can do is pretend that we had infinite number of samples, then figure out which finite set of these infinite number of samples we would keep if we were to downsample back to 3.14159 times the original rate and then compute those polyphase branches only. Conceptually we are still (at least sort-of) zero-stuffing, filtering and then decimating, but in terms of implementation what we really have is sinc-interpolation.

So zero-stuffing and filtering is not wrong, but it can be implemented more efficiently (polyphase) and that more efficient implementation further generalizes to arbitrary sinc-interpolation, which in the special case of interpolating to a sampling rate that's an integer factor of the original sampling rate gives the same results as zero-stuffing and filtering. It's really the same thing, we're just looking at it from a different viewpoint.

Does that make sense now?

Post Reply

Return to “DSP and Plugin Development”