About sampler output buffer oversampling

Kraku
 KVRAF
 Topic Starter
 1516 posts since 13 Oct, 2003 from Oulu, Finland
What have you guys found to be a good oversampling ratio for sampler's output buffer to take care of aliasing when playing higher notes?
Is 4x oversampling sufficient for serious purposes or is the best practise to let the user select the oversampling ratio for their own purposes?
Is 4x oversampling sufficient for serious purposes or is the best practise to let the user select the oversampling ratio for their own purposes?

mystran
 KVRAF
 7331 posts since 12 Feb, 2006 from Helsinki, Finland
Why do you need to oversample the output buffer?
When you take a sample and play it back much faster than the original, what you are effectively doing is increasing the sampling rate, then resampling the resulting signal back to whatever fixed sampling rate you are trying to output. At least assuming a slowly varying playback speed, this is a pure resampling problem and oversampling doesn't necessarily have to be part of the solution at all.
When you take a sample and play it back much faster than the original, what you are effectively doing is increasing the sampling rate, then resampling the resulting signal back to whatever fixed sampling rate you are trying to output. At least assuming a slowly varying playback speed, this is a pure resampling problem and oversampling doesn't necessarily have to be part of the solution at all.
Seeking asylum in any country willing to acknowledge my right to exist.

Kraku
 KVRAF
 Topic Starter
 1516 posts since 13 Oct, 2003 from Oulu, Finland
If the higher frequencies in the played back sample go over the nyquist, they bounce back down. Hence the output buffer where the samples are played back would be oversampled. Once all the voices of the sampler have been played there, the output buffer will be downsampled accordingly.
Reading between the lines, I assume you're suggesting that I use Sinc interpolation for the sample interpolation and use suitable sinc function to low pass filter the sample at resampling time? I.e. depending on what pitch needs to be played, change how much the sinc function will lowpass filter the sample at resampling stage?
Reading between the lines, I assume you're suggesting that I use Sinc interpolation for the sample interpolation and use suitable sinc function to low pass filter the sample at resampling time? I.e. depending on what pitch needs to be played, change how much the sinc function will lowpass filter the sample at resampling stage?

mystran
 KVRAF
 7331 posts since 12 Feb, 2006 from Helsinki, Finland
There is nothing that you should read between the lines, because I'm not suggesting a particular method, I'm just observing that the aliasing here happens during a resampling (or equivalently interpolation) process that you might not be thinking about as resampling, because you've already chosen the box to think inside of.
Seeking asylum in any country willing to acknowledge my right to exist.

Kraku
 KVRAF
 Topic Starter
 1516 posts since 13 Oct, 2003 from Oulu, Finland
I was first thinking of using sinc interpolation with varying kernel length, but then decided against it. This is because now I have an oversampled output buffer + 6 sample 5th order interpolation for the sample itself. The amount of memory scanned and math required per output sample should be less this way.
That being said, I wonder how Emu e6400 Ultra does its interpolation/resampling. I haven't done direct side by side comparisons, but to me it sounds like the results have much more high frequency content when pitched down an octave or so. Just like the sampler generated some musical sounding high frequencies out of thin air... I guess I really do need to test that machine properly and see if there is some DSP magic going on which I would want to copy.
That being said, I wonder how Emu e6400 Ultra does its interpolation/resampling. I haven't done direct side by side comparisons, but to me it sounds like the results have much more high frequency content when pitched down an octave or so. Just like the sampler generated some musical sounding high frequencies out of thin air... I guess I really do need to test that machine properly and see if there is some DSP magic going on which I would want to copy.

dmbaer
 KVRAF
 1592 posts since 11 Nov, 2009 from Northern CA
Here's a link to a fairly scholarly tutorial on the subject:
https://ccrma.stanford.edu/~jos/resample/resample.html
https://ccrma.stanford.edu/~jos/resample/resample.html

Kraku
 KVRAF
 Topic Starter
 1516 posts since 13 Oct, 2003 from Oulu, Finland
Yep, that looks pretty much what I had in mind with the sinc interpolation having the varying kernel length. I.e. you can adjust the lowpass cutoff dynamically.
The only obvious difference seems to be that the sinc functions in your URL are thought to be the other way around: not having a sinc per each and every input sample (within window) but one sinc at the signal sampling point. Which of course is the same thing mathematically. This in turn solves one of the things I hadn't put any time to think through yet: is there some scaling I would need to perform as there are more input samples inside the window? With this "other way around" thinking it seems to be obvious that there are no other scaling factors needed.
Useful document. Thank you!
But that still leaves me with my original point:
If I used the resampling method described in the link, that is a fair amount of math per resampled sample and it will make the higher notes eat up much more CPU. And eventually you cant make the area of input samples too large, which will result in aliasing at that point. I do not know how quickly that limit makes itself heard in practise, though.
With polynomial interpolation the required amount of input samples and math is quite low, especially when done with SIMD instructions. And the results are practically perfect until the highest frequencies bounce back down from nyquist. So oversampling is a must for this method, which is why I mentioned I use it with this method.
The only obvious difference seems to be that the sinc functions in your URL are thought to be the other way around: not having a sinc per each and every input sample (within window) but one sinc at the signal sampling point. Which of course is the same thing mathematically. This in turn solves one of the things I hadn't put any time to think through yet: is there some scaling I would need to perform as there are more input samples inside the window? With this "other way around" thinking it seems to be obvious that there are no other scaling factors needed.
Useful document. Thank you!
But that still leaves me with my original point:
If I used the resampling method described in the link, that is a fair amount of math per resampled sample and it will make the higher notes eat up much more CPU. And eventually you cant make the area of input samples too large, which will result in aliasing at that point. I do not know how quickly that limit makes itself heard in practise, though.
With polynomial interpolation the required amount of input samples and math is quite low, especially when done with SIMD instructions. And the results are practically perfect until the highest frequencies bounce back down from nyquist. So oversampling is a must for this method, which is why I mentioned I use it with this method.

mystran
 KVRAF
 7331 posts since 12 Feb, 2006 from Helsinki, Finland
Can't you just transpose the whole process when pitching up? For every input sample, insert a sinc into the output buffer so that now your cutoff is relative to the output rate. This way you don't need any variable cutoffs or variable filter lengths and even though this kind of scatter is slightly slower than gathering, the difference is actually less than you might think.
Higher CPU usage with higher notes is kinda inevitable though (at least without prefiltering), but then again even the oversampling approach really only "solves" this issue by making all the notes equally expensive to the highest ones, right?
Seeking asylum in any country willing to acknowledge my right to exist.

Kraku
 KVRAF
 Topic Starter
 1516 posts since 13 Oct, 2003 from Oulu, Finland
I probably could. I think I'll have to experiment with all these ideas and see in practise what I think about them.mystran wrote: ↑Fri Dec 09, 2022 6:24 pmCan't you just transpose the whole process when pitching up? For every input sample, insert a sinc into the output buffer so that now your cutoff is relative to the output rate. This way you don't need any variable cutoffs or variable filter lengths and even though this kind of scatter is slightly slower than gathering, the difference is actually less than you might think.
Higher CPU usage with higher notes is kinda inevitable though (at least without prefiltering), but then again even the oversampling approach really only "solves" this issue by making all the notes equally expensive to the highest ones, right?

Jeff McClintock
 KVRist
 393 posts since 30 Jan, 2005 from New Zealand
Not disagreeing with anyone regarding the need to oversample at all.
but FYI, 2x oversampling gives you 1 octave of 'headroom', 4x gives your two octaves, and 8x gives you 3 octaves
So the answer depends on how high you intend to pitchshift the sample.

Music Engineer
 KVRAF
 4180 posts since 8 Mar, 2004 from Berlin, Germany
It's actually even better: you are allowed to produce aliasing at the oversampled rate as long as that aliasing does not enter your final nonoversampled Nyquist interval. So, at 2x oversampling, you are tripling your safe frequency range. If your final target sample rate is 48 kHz but you are running with 2x oversampling (i.e. at 96), you can produce frequencies up to 72 kHz. Yes  sure, the band between 48 and 72 *will* produce aliasing at 96  but not into the frequency range below 24 but rather into the 2448 range  which you will throw away (lowpass) later anyway, so you are allowed to fill the 2448 freqrange with aliasing garbage. And similar considerations apply to higher oversampling factors as well. ...If I'm not mistaken, I think, the general formula for your allowed freqrange is (2*M1) * sampleRate/2 rather than just M * sampleRate/2 for an oversampling factor of M.Jeff McClintock wrote: ↑Mon Dec 12, 2022 12:13 pm but FYI, 2x oversampling gives you 1 octave of 'headroom', 4x gives your two octaves, and 8x gives you 3 octaves

dmbaer
 KVRAF
 1592 posts since 11 Nov, 2009 from Northern CA
I recently asked a question related to this subject here, and it occurs to me there is a distinction needed in this thread as well. The original post was regarding oversampling. But the discussion has also involved resampling. If we make a distinction (which I'm far from sure is proper, though very useful) between the two, there's an important difference in technique.
If oversampling involves increasing the sample rate by some integer factor, as would be used in a distortion process to avoid aliasing, then there's no interpolation involved (as I learned in my earlier thread here). The new samples are just zeroed out and the filtering takes care of supplying the interpolated values.
Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.
If oversampling involves increasing the sample rate by some integer factor, as would be used in a distortion process to avoid aliasing, then there's no interpolation involved (as I learned in my earlier thread here). The new samples are just zeroed out and the filtering takes care of supplying the interpolated values.
Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.

mystran
 KVRAF
 7331 posts since 12 Feb, 2006 from Helsinki, Finland
Interpolation and resampling are really just two ways to look at the same thing.dmbaer wrote: ↑Tue Dec 13, 2022 2:22 pm Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.
In case of upsampling by an integer factor, zerostuffing followed by brickwall filtering gives exactly the same result as direct sincinterpolation. Even in the integer case, a polyphase FIR (which readily generalizes to arbitrary sincinterpolation) is more efficient than explicitly zerostuffing first. There is one case where zerostuffing explicitly makes sense and that's when you want to use an IIR filter that you can't decompose into a polyphase version as the brickwall lowpass.. but with FIRs it just never makes any practical sense except as a learning tool.
Even if we think of something like linear interpolation, this can be understood as reconstructing a continuous time signal where each of the original samples becomes a scaled diracdelta, followed by filtering with a triangular kernel, followed by sampling at the desired timeoffsets. So mathematically linear interpolation is really a case of resampling with a triangular filter, even though this is not how we would typically implement it. If you are going to use a brickwall kernel for higher quality, then linear interpolation followed by brickwall filtering would be the same as resampling with a brickwall convolved with a triangle, which is much worse than the brickwall on it's own; hence the conceptual zerostuffing. Other methods of interpolation like CatmullRom or Bsplines can also be described as filter kernels.
So there really is nothing fundamentally different about any interpolation, resampling or oversampling. There are different methods to implement these things that might be more efficient depending on what is required, but mathematically it's always the same thing and the only thing that varies is the filter kernel used.
Seeking asylum in any country willing to acknowledge my right to exist.

dmbaer
 KVRAF
 1592 posts since 11 Nov, 2009 from Northern CA
Well now I'm quite confused. The earlier thread I've been referring to is here:mystran wrote: ↑Tue Dec 13, 2022 3:42 pm ... There is one case where zerostuffing explicitly makes sense and that's when you want to use an IIR filter that you can't decompose into a polyphase version as the brickwall lowpass.. but with FIRs it just never makes any practical sense except as a learning tool.
viewtopic.php?p=8441127&hilit=dmbaer#p8441127
I thought that concluded: for oversampling (integer multiplier on sampling rate), that zerostuffing was the agreedupon approach. Indeed, earlevel's blog reference (https://www.earlevel.com/main/2007/07/0 ... onversion/) in that thread states:
That’s it—to double the sample rate, we insert a zero between each sample, and lowpass filter to clear the extended part of the audio band. Any lowpass filter will do, as long as you pick one steep enough to get the job done, removing the aliased copy without removing much of the existing signal band. Most often, a linear phase FIR filter is used—performance is good at the relatively high cutoff frequency, phase is maintained, and we have good control over its characteristics.
So, earlevel says zerostuff with FIR and mystran says that makes no sense. What are we mere students seeking enlightenment to think?

mystran
 KVRAF
 7331 posts since 12 Feb, 2006 from Helsinki, Finland
It makes sense mathematically, but it doesn't usually make sense in terms of efficiency.
To understand what is going on, think about the FIR filtering process after zerostuffing to double the sampling rate. To compute one FIR output sample, you multiply each FIR tap with one of the input samples and add the results together. Since we just zerostuffed the signal, every other input sample is known to be zero, so the multiplication by the FIR tap is also zero and we could just as well skip these computations, because we know which samples are zero and which are nonzero.
This leads to a concept known as polyphase filtering. Rather than zerostuffing, in the 2x case we split the FIR into two "branches" where one takes all the odd numbered taps and one takes all the even numbered taps. Then we we can filter the input with each of these shorter FIRs and interleave the results. This gives us the same result as if we'd zerostuffed and filtered with the original FIR, but we compute half the multiply adds.
This polyphase concept can be applied to any oversampling factor, not just 2x case. For 3x you'd split into 3 branches and for 4x you'd split into 4 branches and so on. But we can go further: for infinite factor you'd split into infinite branches and now we essentially have a continuous signal. We can't really design a FIR for infinite oversampling, but we can design a FIR for some "high" oversampling factor (say 256x) and then apply linear interpolation to those branches to pretend we have infinitely many.
Obviously we can't actually compute infinite oversampling either, but suppose we want some "weird" ratio, like 3.14159? What we can do is pretend that we had infinite number of samples, then figure out which finite set of these infinite number of samples we would keep if we were to downsample back to 3.14159 times the original rate and then compute those polyphase branches only. Conceptually we are still (at least sortof) zerostuffing, filtering and then decimating, but in terms of implementation what we really have is sincinterpolation.
So zerostuffing and filtering is not wrong, but it can be implemented more efficiently (polyphase) and that more efficient implementation further generalizes to arbitrary sincinterpolation, which in the special case of interpolating to a sampling rate that's an integer factor of the original sampling rate gives the same results as zerostuffing and filtering. It's really the same thing, we're just looking at it from a different viewpoint.
Does that make sense now?
Seeking asylum in any country willing to acknowledge my right to exist.