About sampler output buffer oversampling
-
- KVRAF
- Topic Starter
- 1609 posts since 13 Oct, 2003 from Oulu, Finland
What have you guys found to be a good oversampling ratio for sampler's output buffer to take care of aliasing when playing higher notes?
Is 4x oversampling sufficient for serious purposes or is the best practise to let the user select the oversampling ratio for their own purposes?
Is 4x oversampling sufficient for serious purposes or is the best practise to let the user select the oversampling ratio for their own purposes?
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
Why do you need to oversample the output buffer?
When you take a sample and play it back much faster than the original, what you are effectively doing is increasing the sampling rate, then resampling the resulting signal back to whatever fixed sampling rate you are trying to output. At least assuming a slowly varying playback speed, this is a pure resampling problem and oversampling doesn't necessarily have to be part of the solution at all.
When you take a sample and play it back much faster than the original, what you are effectively doing is increasing the sampling rate, then resampling the resulting signal back to whatever fixed sampling rate you are trying to output. At least assuming a slowly varying playback speed, this is a pure resampling problem and oversampling doesn't necessarily have to be part of the solution at all.
-
- KVRAF
- Topic Starter
- 1609 posts since 13 Oct, 2003 from Oulu, Finland
If the higher frequencies in the played back sample go over the nyquist, they bounce back down. Hence the output buffer where the samples are played back would be oversampled. Once all the voices of the sampler have been played there, the output buffer will be downsampled accordingly.
Reading between the lines, I assume you're suggesting that I use Sinc interpolation for the sample interpolation and use suitable sinc function to low pass filter the sample at resampling time? I.e. depending on what pitch needs to be played, change how much the sinc function will lowpass filter the sample at resampling stage?
Reading between the lines, I assume you're suggesting that I use Sinc interpolation for the sample interpolation and use suitable sinc function to low pass filter the sample at resampling time? I.e. depending on what pitch needs to be played, change how much the sinc function will lowpass filter the sample at resampling stage?
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
There is nothing that you should read between the lines, because I'm not suggesting a particular method, I'm just observing that the aliasing here happens during a resampling (or equivalently interpolation) process that you might not be thinking about as resampling, because you've already chosen the box to think inside of.
-
- KVRAF
- Topic Starter
- 1609 posts since 13 Oct, 2003 from Oulu, Finland
I was first thinking of using sinc interpolation with varying kernel length, but then decided against it. This is because now I have an oversampled output buffer + 6 sample 5th order interpolation for the sample itself. The amount of memory scanned and math required per output sample should be less this way.
That being said, I wonder how E-mu e6400 Ultra does its interpolation/resampling. I haven't done direct side by side comparisons, but to me it sounds like the results have much more high frequency content when pitched down an octave or so. Just like the sampler generated some musical sounding high frequencies out of thin air... I guess I really do need to test that machine properly and see if there is some DSP magic going on which I would want to copy.
That being said, I wonder how E-mu e6400 Ultra does its interpolation/resampling. I haven't done direct side by side comparisons, but to me it sounds like the results have much more high frequency content when pitched down an octave or so. Just like the sampler generated some musical sounding high frequencies out of thin air... I guess I really do need to test that machine properly and see if there is some DSP magic going on which I would want to copy.
-
- KVRAF
- 1668 posts since 11 Nov, 2009 from Northern CA
Here's a link to a fairly scholarly tutorial on the subject:
https://ccrma.stanford.edu/~jos/resample/resample.html
https://ccrma.stanford.edu/~jos/resample/resample.html
-
- KVRAF
- Topic Starter
- 1609 posts since 13 Oct, 2003 from Oulu, Finland
Yep, that looks pretty much what I had in mind with the sinc interpolation having the varying kernel length. I.e. you can adjust the lowpass cutoff dynamically.
The only obvious difference seems to be that the sinc functions in your URL are thought to be the other way around: not having a sinc per each and every input sample (within window) but one sinc at the signal sampling point. Which of course is the same thing mathematically. This in turn solves one of the things I hadn't put any time to think through yet: is there some scaling I would need to perform as there are more input samples inside the window? With this "other way around" thinking it seems to be obvious that there are no other scaling factors needed.
Useful document. Thank you!
But that still leaves me with my original point:
If I used the resampling method described in the link, that is a fair amount of math per resampled sample and it will make the higher notes eat up much more CPU. And eventually you cant make the area of input samples too large, which will result in aliasing at that point. I do not know how quickly that limit makes itself heard in practise, though.
With polynomial interpolation the required amount of input samples and math is quite low, especially when done with SIMD instructions. And the results are practically perfect until the highest frequencies bounce back down from nyquist. So oversampling is a must for this method, which is why I mentioned I use it with this method.
The only obvious difference seems to be that the sinc functions in your URL are thought to be the other way around: not having a sinc per each and every input sample (within window) but one sinc at the signal sampling point. Which of course is the same thing mathematically. This in turn solves one of the things I hadn't put any time to think through yet: is there some scaling I would need to perform as there are more input samples inside the window? With this "other way around" thinking it seems to be obvious that there are no other scaling factors needed.
Useful document. Thank you!
But that still leaves me with my original point:
If I used the resampling method described in the link, that is a fair amount of math per resampled sample and it will make the higher notes eat up much more CPU. And eventually you cant make the area of input samples too large, which will result in aliasing at that point. I do not know how quickly that limit makes itself heard in practise, though.
With polynomial interpolation the required amount of input samples and math is quite low, especially when done with SIMD instructions. And the results are practically perfect until the highest frequencies bounce back down from nyquist. So oversampling is a must for this method, which is why I mentioned I use it with this method.
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
Can't you just transpose the whole process when pitching up? For every input sample, insert a sinc into the output buffer so that now your cutoff is relative to the output rate. This way you don't need any variable cutoffs or variable filter lengths and even though this kind of scatter is slightly slower than gathering, the difference is actually less than you might think.
Higher CPU usage with higher notes is kinda inevitable though (at least without pre-filtering), but then again even the oversampling approach really only "solves" this issue by making all the notes equally expensive to the highest ones, right?
-
- KVRAF
- Topic Starter
- 1609 posts since 13 Oct, 2003 from Oulu, Finland
I probably could. I think I'll have to experiment with all these ideas and see in practise what I think about them.mystran wrote: ↑Sat Dec 10, 2022 2:24 amCan't you just transpose the whole process when pitching up? For every input sample, insert a sinc into the output buffer so that now your cutoff is relative to the output rate. This way you don't need any variable cutoffs or variable filter lengths and even though this kind of scatter is slightly slower than gathering, the difference is actually less than you might think.
Higher CPU usage with higher notes is kinda inevitable though (at least without pre-filtering), but then again even the oversampling approach really only "solves" this issue by making all the notes equally expensive to the highest ones, right?
-
Jeff McClintock Jeff McClintock https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=56398
- KVRist
- 413 posts since 30 Jan, 2005 from New Zealand
Not disagreeing with anyone regarding the need to oversample at all.
but FYI, 2x oversampling gives you 1 octave of 'headroom', 4x gives your two octaves, and 8x gives you 3 octaves
So the answer depends on how high you intend to pitch-shift the sample.
-
Music Engineer Music Engineer https://www.kvraudio.com/forum/memberlist.php?mode=viewprofile&u=15959
- KVRAF
- 4285 posts since 8 Mar, 2004 from Berlin, Germany
It's actually even better: you are allowed to produce aliasing at the oversampled rate as long as that aliasing does not enter your final non-oversampled Nyquist interval. So, at 2x oversampling, you are tripling your safe frequency range. If your final target sample rate is 48 kHz but you are running with 2x oversampling (i.e. at 96), you can produce frequencies up to 72 kHz. Yes - sure, the band between 48 and 72 *will* produce aliasing at 96 - but not into the frequency range below 24 but rather into the 24-48 range - which you will throw away (lowpass) later anyway, so you are allowed to fill the 24-48 freq-range with aliasing garbage. And similar considerations apply to higher oversampling factors as well. ...If I'm not mistaken, I think, the general formula for your allowed freq-range is (2*M-1) * sampleRate/2 rather than just M * sampleRate/2 for an oversampling factor of M.Jeff McClintock wrote: ↑Mon Dec 12, 2022 8:13 pm but FYI, 2x oversampling gives you 1 octave of 'headroom', 4x gives your two octaves, and 8x gives you 3 octaves
-
- KVRAF
- 1668 posts since 11 Nov, 2009 from Northern CA
I recently asked a question related to this subject here, and it occurs to me there is a distinction needed in this thread as well. The original post was regarding oversampling. But the discussion has also involved resampling. If we make a distinction (which I'm far from sure is proper, though very useful) between the two, there's an important difference in technique.
If oversampling involves increasing the sample rate by some integer factor, as would be used in a distortion process to avoid aliasing, then there's no interpolation involved (as I learned in my earlier thread here). The new samples are just zeroed out and the filtering takes care of supplying the interpolated values.
Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.
If oversampling involves increasing the sample rate by some integer factor, as would be used in a distortion process to avoid aliasing, then there's no interpolation involved (as I learned in my earlier thread here). The new samples are just zeroed out and the filtering takes care of supplying the interpolated values.
Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
Interpolation and resampling are really just two ways to look at the same thing.dmbaer wrote: ↑Tue Dec 13, 2022 10:22 pm Resampling, as might be needed in playing back a sample at a new pitch, is a whole different thing that does require interpolation. I'm not trying to pose as an expert on this, but I learned much in a recent discussion that revealed how resampling and oversampling techniques are actually quite different. So maybe pointing out this will clarify things a bit.
In case of upsampling by an integer factor, zero-stuffing followed by brickwall filtering gives exactly the same result as direct sinc-interpolation. Even in the integer case, a polyphase FIR (which readily generalizes to arbitrary sinc-interpolation) is more efficient than explicitly zero-stuffing first. There is one case where zero-stuffing explicitly makes sense and that's when you want to use an IIR filter that you can't decompose into a polyphase version as the brickwall lowpass.. but with FIRs it just never makes any practical sense except as a learning tool.
Even if we think of something like linear interpolation, this can be understood as reconstructing a continuous time signal where each of the original samples becomes a scaled dirac-delta, followed by filtering with a triangular kernel, followed by sampling at the desired time-offsets. So mathematically linear interpolation is really a case of resampling with a triangular filter, even though this is not how we would typically implement it. If you are going to use a brickwall kernel for higher quality, then linear interpolation followed by brickwall filtering would be the same as resampling with a brickwall convolved with a triangle, which is much worse than the brickwall on it's own; hence the conceptual zero-stuffing. Other methods of interpolation like Catmull-Rom or B-splines can also be described as filter kernels.
So there really is nothing fundamentally different about any interpolation, resampling or oversampling. There are different methods to implement these things that might be more efficient depending on what is required, but mathematically it's always the same thing and the only thing that varies is the filter kernel used.
-
- KVRAF
- 1668 posts since 11 Nov, 2009 from Northern CA
Well now I'm quite confused. The earlier thread I've been referring to is here:mystran wrote: ↑Tue Dec 13, 2022 11:42 pm ... There is one case where zero-stuffing explicitly makes sense and that's when you want to use an IIR filter that you can't decompose into a polyphase version as the brickwall lowpass.. but with FIRs it just never makes any practical sense except as a learning tool.
viewtopic.php?p=8441127&hilit=dmbaer#p8441127
I thought that concluded: for oversampling (integer multiplier on sampling rate), that zero-stuffing was the agreed-upon approach. Indeed, earlevel's blog reference (https://www.earlevel.com/main/2007/07/0 ... onversion/) in that thread states:
That’s it—to double the sample rate, we insert a zero between each sample, and low-pass filter to clear the extended part of the audio band. Any low-pass filter will do, as long as you pick one steep enough to get the job done, removing the aliased copy without removing much of the existing signal band. Most often, a linear phase FIR filter is used—performance is good at the relatively high cut-off frequency, phase is maintained, and we have good control over its characteristics.
So, earlevel says zero-stuff with FIR and mystran says that makes no sense. What are we mere students seeking enlightenment to think?
- KVRAF
- 7890 posts since 12 Feb, 2006 from Helsinki, Finland
It makes sense mathematically, but it doesn't usually make sense in terms of efficiency.
To understand what is going on, think about the FIR filtering process after zero-stuffing to double the sampling rate. To compute one FIR output sample, you multiply each FIR tap with one of the input samples and add the results together. Since we just zero-stuffed the signal, every other input sample is known to be zero, so the multiplication by the FIR tap is also zero and we could just as well skip these computations, because we know which samples are zero and which are non-zero.
This leads to a concept known as polyphase filtering. Rather than zero-stuffing, in the 2x case we split the FIR into two "branches" where one takes all the odd numbered taps and one takes all the even numbered taps. Then we we can filter the input with each of these shorter FIRs and interleave the results. This gives us the same result as if we'd zero-stuffed and filtered with the original FIR, but we compute half the multiply adds.
This polyphase concept can be applied to any oversampling factor, not just 2x case. For 3x you'd split into 3 branches and for 4x you'd split into 4 branches and so on. But we can go further: for infinite factor you'd split into infinite branches and now we essentially have a continuous signal. We can't really design a FIR for infinite oversampling, but we can design a FIR for some "high" oversampling factor (say 256x) and then apply linear interpolation to those branches to pretend we have infinitely many.
Obviously we can't actually compute infinite oversampling either, but suppose we want some "weird" ratio, like 3.14159? What we can do is pretend that we had infinite number of samples, then figure out which finite set of these infinite number of samples we would keep if we were to downsample back to 3.14159 times the original rate and then compute those polyphase branches only. Conceptually we are still (at least sort-of) zero-stuffing, filtering and then decimating, but in terms of implementation what we really have is sinc-interpolation.
So zero-stuffing and filtering is not wrong, but it can be implemented more efficiently (polyphase) and that more efficient implementation further generalizes to arbitrary sinc-interpolation, which in the special case of interpolating to a sampling rate that's an integer factor of the original sampling rate gives the same results as zero-stuffing and filtering. It's really the same thing, we're just looking at it from a different viewpoint.
Does that make sense now?