KVR Audio

sault · Post by **sault** » Fri Mar 17, 2017 5:24 am

Okay, so this is where I'm at with my understanding of this process.

Upsampling is zero-stuffing then lowpassing. Based on what I'm seeing on the spectrogram, I believe this is exactly what is happening with both sinc kernels and interpolation algorithms. Zero-stuffing causes mirrored frequencies, though, so when I upsampled with both sinc kernels and interpolation polynomials I saw varying degrees of mirroring in the upper frequencies. The higher the quality of polynomial, the longer the kernel, the more said frequencies are attenuated.

What surprised me was that there was even mirroring when downsampling by a non-integer amount. These mirrored frequencies were pretty low amplitude, even with linear interpolation, but they were definitely there. There were more of them, as well, downsampling by 9/5 seemed to produce about 8 sets of mirrored frequencies. Very unexpected.

This tells me why Oli (deip.pdf) emphasized working with oversampled data. Well, two or three reasons really, but the first is that the mirrored frequencies would be outside of the audible band and thus disappear when you downsample (but they would also alias if you don't use a decent filter, too!). It also tells me that even if an interpolation algorithm has a rotten passband, the passband would start above audible frequencies and therefore not affect them.

In other words, at 16x it doesn't really matter if you're working with linear interpolation because your high frequencies are in that first 1x and they won't be attenuated by any meaningful amount. And of course, the mirrored frequencies are high enough that they will all be filtered out when you downsample.

What does it all mean? It means that if I was resampling data I would seriously consider 2x upsampling first using a quality sinc kernel, use a decent polynomial for the resampling, then bandlimit with whatever is suitable (e.g. a windowed sinc with a bunch of taps and a cutoff around 0.55, linear or minimum phase, tweak to taste).

Probably a more appropriate document than the polynomial one is Laurent's "Quest for a Perfect Resampler". Staring at a spectrogram has helped me understand some of what he's saying a little better.

https://www.mp3-tech.org/programmer/docs/resampler.pdf

sault · Post by **sault** » Fri Mar 17, 2017 6:17 am

I think they [the polynomials in deip.pdf] are supposed to approximate a windowed sinc function.

I think most of them naturally trend in that direction, except for his "optimal" designs, which he states converge on linear interpolation. I don't like the "optimal" designs, he states that the resampled points don't necessarily go through the interpolated points. Run some tests and compare them against Hermite or Trilinear or Lagrange, you'll see what I mean. That's a deal breaker for me personally, I'm sure smarter people than me might know how to put them to better use.

For instance cheap polyphase 4X upsampling, maybe three sets of 8 point FIR coffs.

I humbly suggest that 8 points is nowhere near enough to properly 4x upsample if you want to keep your top end at least. I've put together what I consider to be a "minimum required" 4x oversampler based around a 31-pt kernel, but if I'd thought about it a little more I would've gone for at least 43. Both would have a passband starting around 15 kHz (assuming a samplerate of 44.1 kHz), but the 43 tap would have 70 dB attenuation vs the 31-tap's 60.

EDIT - It also bears mention that the 31-tap does nowhere near enough to attenuate those mirrored frequencies. If I was looking to eliminate them I would aim for at least 90 dB of attenuation, probably more. Lotsa taps needed.

JCJR · Post by **JCJR** » Fri Mar 17, 2017 10:28 pm

sault wrote: I humbly suggest that 8 points is nowhere near enough to properly 4x upsample if you want to keep your top end at least. I've put together what I consider to be a "minimum required" 4x oversampler based around a 31-pt kernel, but if I'd thought about it a little more I would've gone for at least 43. Both would have a passband starting around 15 kHz (assuming a samplerate of 44.1 kHz), but the 43 tap would have 70 dB attenuation vs the 31-tap's 60.

EDIT - It also bears mention that the 31-tap does nowhere near enough to attenuate those mirrored frequencies. If I was looking to eliminate them I would aim for at least 90 dB of attenuation, probably more. Lotsa taps needed.

Thanks Sault

Yes 8 points are probably inadequate for many purposes. I'm permanently ignorant of these topics but intend to experiment with it some more sometime.

Last year was playing with lanczos interpolation, as explained here-- https://en.wikipedia.org/wiki/Lanczos_resampling

For some non-audio interpolation purposes, lanczos is supposedly considered "about as good as it gets" in trade-off with the calculation expense. But I don't know how true that would be.

Lanczos is basically FIR filtering using a sinc-windowed sinc. The coefficient derivation is simple/fast, and works "about the same" as calculating an arbitrary point between samples using linear, hermite or trilinear or whatever. Given a desired inter-sample location, you do fairly simple math to calc that interpolated value based on surrounding sample values.

Surely upsampling with this sinc-windowed sinc would be the equivalent of zero-stuffing and filtering, but procedurally there would be no "actual" zero-stuffing operation. One would decide on the order of the filter. An 8 point lanczos would consider 8 samples on each side of the target interpolated location. So for 4X upsampling, one could calc a table of 16 coffs for the 25% interpolated location, and two other tables of 16 coffs for the 50% and 75% interpolated locations. Making the assumption that the input signal is already band-limited and the input samples don't need to be filtered.

So I "think" the above may be similar to a polyphase FIR with a total of 3 * 16 = 48 taps, using three sinc-windowed sinc kernels. Maybe audio-wise it would be bad-performing compared to designs requiring more brain-sweat. Dunno. The advantage for my testing was simplicity. Get higher upsampling ratios merely by adding more fractional-location kernels, and get "more precise" interpolation of each point merely by using bigger kernels, all simple-calculated with the same simple formula.

Haven't studied on possible differences of the above compared to conventional FIR filtering. For instance in the above case, there are 48 coefficients but they only span 8 samples on each side of the interpolated location. Wheras a conventional 48 sample FIR would span 24 samples on each side of the interpolated location. But if applying the 48 point FIR on zero-stuffed data, only a third of the 48 coffs would be used for each output sample, and the 48 coffs would only span 8 "non zero-stuffed" original input samples on each side of the target output sample. So maybe its about the same or maybe not. Dunno.

I used it with "pretty good" success as a 4X upsample for a peak limiter, but in that case the upsample is only on the envelope follower code. The 4X audio never gets to the outputs. It is just there to track "true peaks" at the original samplerate.

Sometime would like to actually listen to the audio and find out if it is good for anything in the actual audio path.

mystran · Post by **mystran** » Fri Mar 17, 2017 10:30 pm

sault wrote: What surprised me was that there was even mirroring when downsampling by a non-integer amount. These mirrored frequencies were pretty low amplitude, even with linear interpolation, but they were definitely there. There were more of them, as well, downsampling by 9/5 seemed to produce about 8 sets of mirrored frequencies. Very unexpected.

For M/N rational fraction, it's useful to think about it as upsampling by M followed by downsampling by N. The filtering goes in between these two operations and I suspect that looking at it like this will probably explain the results you are getting.

mystran · Post by **mystran** » Fri Mar 17, 2017 11:03 pm

JCJR wrote:For some non-audio interpolation purposes, lanczos is supposedly considered "about as good as it gets" in trade-off with the calculation expense. But I don't know how true that would be.

These are applications like image processing where the signal information is encoded mostly in the spatial domain (where as in audio, the important information is mostly in the spectral domain). Specifically, the use of another sinc as a window doesn't really make sense if you are trying to get good stop-band attenuation and sharp transition (which is what you want in audio), but the impulse response shape is nice if you are mostly concerned with preserving the shape of the signal itself (which is what our eyes judge in resampled images).

JCJR · Post by **JCJR** » Sat Mar 18, 2017 8:56 pm

mystran wrote:
JCJR wrote:For some non-audio interpolation purposes, lanczos is supposedly considered "about as good as it gets" in trade-off with the calculation expense. But I don't know how true that would be.
These are applications like image processing where the signal information is encoded mostly in the spatial domain (where as in audio, the important information is mostly in the spectral domain). Specifically, the use of another sinc as a window doesn't really make sense if you are trying to get good stop-band attenuation and sharp transition (which is what you want in audio), but the impulse response shape is nice if you are mostly concerned with preserving the shape of the signal itself (which is what our eyes judge in resampled images).

Thanks mystran. Your expertise is always greatly appreciated.

In the case of upsampling, on the assumption that source audio is properly bandlimited, is it a bad thing to preserve the shape of the signal? What kind of damage or bad symptoms might one encounter? Naively I would have thought that for upsampling, that interpolating additional points preserving the signal shape would not add any out-of-band harmonics, if the source is bandlimited?

Anyway, even if that is not the case, maybe I can add different windowing functions to the little resampling object based on lanczos. As said before, it just seemed an easy-to-understand (for me) way to do it. Adjust upsampling ratio (in integer increments) by setting the number of interpolation tables, and adjust interpolation accuracy by setting the size of each table. Presumably the same code would still work about the same if I use a different window than sinc.

One thing though-- I wasn't filtering the original input samples, only the interpolated samples. Am guessing that if the window shape and kernel length has a "sloppy" filter cutoff, attenuating a good bit in the top octave (at original nyquist) then the original input samples would also need filtering? Otherwise, it might be nasty if the interpolated samples have a different high-frequency rolloff than the original samples interspersed in the stream?

sault · Post by **sault** » Sun Mar 19, 2017 3:35 am

In the case of upsampling, on the assumption that source audio is properly bandlimited, is it a bad thing to preserve the shape of the signal?

Depends what your design goal is. If you want to keep the shape as preserved as possible you want linear phase with high taps, but not so high that it causes audible ringing. A different design goal would be to reduce group latency, which means a minimum phase kernel, which does alter the shape of the waveform. E.g. at 44.1 kHz an 87 pt linear phase sinc kernel causes almost 2 ms of latency (counting both upsampling and downsampling). Moving the downsampler to minimum phase can reduce that by a good 30%. If you're shooting for real-time performance, that's important.

Naively I would have thought that for upsampling, that interpolating additional points preserving the signal shape would not add any out-of-band harmonics, if the source is bandlimited?

Interpolation approximates the perfect resampling, and the more perfect the approximation the less out-of-band harmonics are created (I spoke about this a few posts earlier). So a poor approximation (linear) creates lots, while a better algorithm (6 pt 3 o Hermite) creates much less, and a properly windowed sinc kernel of decent length creates almost none.

I mean, technically you're right. Perfect upsampling shouldn't cause any harmonics. That said, nothing is perfect.

Am guessing that if the window shape and kernel length has a "sloppy" filter cutoff, attenuating a good bit in the top octave (at original nyquist) then the original input samples would also need filtering?

If I understand your question... a kernel is a lowpass filter. A bad lowpass filter (the wide passband, poor attenuation that I mentioned above) dulls the sound. This is the sound you end up with, just at a higher samplerate. Something Oli talks about in the deip.pdf is pre-emphasis and "pinking" filter, which is basically high-frequency boosting to try and counter-act the low-frequency loss the interpolating algorithms introduce, the same high-frequency loss that a poor sinc filter would introduce. So that's an option you can explore.

JCJR · Post by **JCJR** » Sun Mar 19, 2017 9:12 am

sault wrote:
Am guessing that if the window shape and kernel length has a "sloppy" filter cutoff, attenuating a good bit in the top octave (at original nyquist) then the original input samples would also need filtering?
If I understand your question... a kernel is a lowpass filter. A bad lowpass filter (the wide passband, poor attenuation that I mentioned above) dulls the sound. This is the sound you end up with, just at a higher samplerate. Something Oli talks about in the deip.pdf is pre-emphasis and "pinking" filter, which is basically high-frequency boosting to try and counter-act the low-frequency loss the interpolating algorithms introduce, the same high-frequency loss that a poor sinc filter would introduce. So that's an option you can explore.

Thanks sault

What I was getting at-- If upsampling by keeping the original samples and inserting interpolated samples in-between-- For instance if 4X interpolation is performed with a linear-phase filter perhaps -3 dB at 12 kHz and -72 dB at 20 kHz or whatever--

1 out of 4 output samples (the un-modified original source samples) would have a "flat" frequency response and the other 3 samples in each group would be heavier lowpass filtered. That probably wouldn't be a good thing.

mystran · Post by **mystran** » Sun Mar 19, 2017 9:43 am

JCJR wrote: 1 out of 4 output samples (the un-modified original source samples) would have a "flat" frequency response and the other 3 samples in each group would be heavier lowpass filtered. That probably wouldn't be a good thing.

You need to (ideally) use the same response for all samples with only the delay varying (which is strictly speaking impossible for finite-length kernels, but you should try anyway). For windowed-sinc designs, if you arrange the time-center of the kernel (ie. t=0 of the zero-phase kernel) to align with the sampling grid and set the cutoff to exactly Nyquist then that particular branch will be zero-valued at other samples and you can skip processing if you feel like it.. although this will then lead to somewhat inconvenient design if the up-sampling ratio is even, as you would need the kernel to be odd-length (and you then need different alignment for down-sampling if you want the total latency to add into an integer at the lower rate).

sault · Post by **sault** » Mon Mar 20, 2017 6:51 pm

1 out of 4 output samples (the un-modified original source samples) would have a "flat" frequency response and the other 3 samples in each group would be heavier lowpass filtered. That probably wouldn't be a good thing.

Sometimes it's the case that values are passed through without alteration, but it depends upon what method you're using.

Symmetric sinc kernel (this implies linear phase) with an odd number of taps set with a 1/integer cut-off (implying that some coefficients are zero)? Yes. Certain polynomials (linear, trilinear, Hermite, Lagrange)? Yes.

A minimum phase kernel (non-symmetric)? Nope. A symmetric sinc kernel with a non 1/integer cutoff? Nope. B-splines or the "optimal" polynomials presented in deip.pdf (I believe Lanczos falls under here as well)? (EDIT - Lanczos kernel is linear phase, my bad) Nope. Any type of IIR algorithm, whether a 12th order Butterworth lowpass or those nifty allpass based resamplers? Not really.

JCJR · Post by **JCJR** » Sat Mar 25, 2017 9:28 pm

sault wrote:
1 out of 4 output samples (the un-modified original source samples) would have a "flat" frequency response and the other 3 samples in each group would be heavier lowpass filtered. That probably wouldn't be a good thing.
Sometimes it's the case that values are passed through without alteration, but it depends upon what method you're using.

Symmetric sinc kernel (this implies linear phase) with an odd number of taps set with a 1/integer cut-off (implying that some coefficients are zero)? Yes. Certain polynomials (linear, trilinear, Hermite, Lagrange)? Yes.

A minimum phase kernel (non-symmetric)? Nope. A symmetric sinc kernel with a non 1/integer cutoff? Nope. B-splines or the "optimal" polynomials presented in deip.pdf (I believe Lanczos falls under here as well)? Nope. Any type of IIR algorithm, whether a 12th order Butterworth lowpass or those nifty allpass based resamplers? Not really.

Thanks sault and mystran. Apologies thread-drifting away from Matt's original question.

I had read about FIR filtering long ago but hadn't used it much until the last couple of years. The potential cpu load seemed unappealing, though nowadays computers are faster.

In the past I wanted as low aliasing feasible and as flat frequency response feasible, considering the cpu load. Did not care about phase behavior in the top octave. Made a resampling object with various selectable quality levels, borrowing ideas of Olli Niemitalo and Laurent de Soras and others on musicdsp mailing list. It was in asm for "as fast possible". The "highest quality" resampling was not as fancy as olli's fancier schemes. But it seemed to work OK. So far as I know, there were few or no customer complaints about the resampling quality in the apps it was used, though maybe even the highest quality settings would be unacceptable in some uses, for some customers.

According to quality setting, it would do fractional interpolation with either linear interp, allpass filter interp, or hermite interp. Filtering was done with combinations of IIR 2nd order Lowpass cascades and/or polyphase IIR halfband filters (which are cascades of IIR allpass filters). I did "pre-emphasis" to somewhat ameliorate top-octave rolloff by carefully adjusting the frequency and Q of IIR lowpass filters to boost the top octave a little bit while also lowpass filtering the signal. That avoided having to apply a separate stage of pre-emphasis/de-emphasis. The RBJ lowpass filters naturally roll off "really heavy" nearing nyquist which may be an advantage for anti-aliasing or maybe not. I suspected it would be of most use if the risk of aliasing is fairly minor, with reflected alias frequencies restricted perhaps to the top half or top third of an octave below nyquist. Dunno.

Up and down sampling was always a factor of 2. For instance if upsampling X8, it would do three asm tight loops upsampling X2 on each iteration. If fractional resampling was needed, it would upsample the source, do linear or hermite fractional resampling, then downsample X2 as many times necessary to reach the target samplerate. All the routines worked on buffers rather than "a sample at a time".

All the methods would filter all the samples at each intermediate samplerate-- Which was a disadvantage compared to smarter methods which can sometimes ignore intermediate values that won't affect the final output. But the dumb way was a built-in idiot-programmer protection against "keeping the original sample values" when it would be inappropriate to do so.

Was just trying to get something that works good enough and fast enough and I didn't know a great deal of theory then or now.

[continued]

JCJR · Post by **JCJR** » Sat Mar 25, 2017 9:33 pm

Lately after retirement was playing a bit with sinc filtering with the sinc tuned to nyquist, so am ignorant of many details, though I had studied on it many years ago.

So far as I can determine, that lanczos interpolation tunes the sinc to nyquist and would be linear phase. Haven't tested, but I don't see a reason that the exact same code wouldn't work the same if the kernel window is changed to blackman, kaiser or whatever other window. Dunno if changing the window would make it work better or worse for upsampling. Maybe there would be better/worse performing windows for downsampling, dunno. Mystran says so and he knows a great deal about it.

That "lanczos approach" might be an inefficient way to do non-integer resampling, because one could not use pre-computed kernels for the majority of arbitrary fractional rates. For instance, 44.1 k to 48 k, the sample fraction is different for each new value though maybe after a long time the sequence of fractions would repeat. So the kernel coefficients would have to be recalculated for each new sample.

On the other hand, the coefficient calculations for such as higher order hermites are not trivial, so maybe the same-order lanczos wouldn't really be at significant disadvantage in cpu load if you have to recalc the coefficients for each sample (as in non-integer resampling). Maybe sometime I'll test that.

In my ignorance, the even / odd length of filter kernels is interesting-- So far as I understand, if you view it as a "filtering task"-- Zero-stuff the data and use an odd length kernel, with the center coefficient aligned on the target sample.

However, if you view the same thing as an "interpolation task"-- You use an odd length kernel, but the center coefficient of the kernel is the unknown interpolation location which we need to calculate. Therefore we don't use the center coefficient. Assuming the fractional sample location is always between 0.0 and 1.0, we use an even number of coefficients, half of them before the target location and the other half after the target location. But it is still an odd-length kernel. We just don't use the center coff. So far as I can figure it. Probably a Captain Obvious observation to someone who already knows all about it.

sault · Post by **sault** » Sun Mar 26, 2017 7:21 am

However, if you view the same thing as an "interpolation task"-- You use an odd length kernel, but the center coefficient of the kernel is the unknown interpolation location which we need to calculate. Therefore we don't use the center coefficient.

But an even length kernel is not the same as an odd length kernel with the center coefficient missing? I don't think this is accurate either way, the input value for the center coefficient matters too.

Imagine I have a 7-tap filter that is using a sinc kernel to delay the input signal by 0.1 samples. D is the center coefficient, but you absolutely still need the input value that corresponds to it to calculate the shifted output, i.e.

Code: Select all

0 1 2 3 4 5 6 7 8 9
a b c d e f g            = new "3" value
  a b c d e f g          = new "4" value etc
    a b c d e f g

Imagine you are upsampling by 2... it doesn't work like this

Code: Select all

0 0 1 0 2 0 3 0 4 0 5 0
a b c   e f g
  a b c   e f g
    a b c   e f g

it does look like this, though

Code: Select all

0 0 1 0 2 0 3 0 4 0 5 0
a b c d e f g                = upsampled value between 1 and 2
  a b c d e f g              = upsampled "2" value
    a b c d e f g

sault · Post by **sault** » Sun Mar 26, 2017 7:31 am

And yes, I made a mistake earlier. Lanczos is sinc-based, it's symmetric and linear phase. It has better spectral characteristics than some windows, but markedly poorer ones than high-alpha Kaiser or Blackman-Harris.

mystran · Post by **mystran** » Sun Mar 26, 2017 10:01 pm

JCJR wrote: So far as I can determine, that lanczos interpolation tunes the sinc to nyquist and would be linear phase. Haven't tested, but I don't see a reason that the exact same code wouldn't work the same if the kernel window is changed to blackman, kaiser or whatever other window. Dunno if changing the window would make it work better or worse for upsampling. Maybe there would be better/worse performing windows for downsampling, dunno. Mystran says so and he knows a great deal about it.

See the plots of various common window functions here: https://en.wikipedia.org/wiki/Window_function

When you look at those plots, you notice that they all have a different shape in the frequency domain. When you multiply the ideal (infinite) sinc with one of these, what happens is that the brickwall spectrum is convolved (in the spectral domain) with the spectrum of the chosen window function to get the final filter response.

Now, scaling the window length in time-domain scales it's response in the frequency domain in inverse (ie. shorter time-domain window becomes wider in frequency domain and the other way around), but other than time/frequency scaling it has little to no effect on the spectrum shape (for reasonable filter lengths at least; very short kernels might be somewhat worse, but let's not go into that), so essentially the choice of window directly controls the filter's attenuation and transition shape, while the choice of kernel length simply allows you to scale the transition region.

As such, you should generally approach windowed sinc design (for audio anyway) by first choosing a window that quickly decays to whatever attenuation you want and then increase the number of taps until the transition becomes narrow enough. While the number of taps doesn't have much direct effect on the attenuation at all, windows with more attenuation generally have wider "main lobe" and therefore require more taps for similar transition width, so ultimately you usually have to iterate a bit to figure out where your trade-off is.. but the point is, it's the window that is ultimately the most important choice you make.

If you scroll down the page, you will find Lanczos plotted as well and notice that the side-bands decay very slowly and eventually level off somewhere around -80dB or a little below that. In fact it is only slightly better than a triangular window. This means that you need a very long kernel to really get something that looks like a "brickwall" transition and even then your maximum attenuation is severely limited when compared to windows that optimize the frequency domain performance.

Like I said, if you are doing image processing then this might be a reasonable choice, but the constraints are very different: we are looking at the spatial images directly, aliasing suppression (or a little bit of blurring for that matter) is not that critical and fairly small amounts of ringing tend to be much more objectionable. Audio in comparison is predominantly perceived in frequency domain and time-domain considerations (well, ignoring latency anyway) only become important once the filter is long enough that ringing (eg. pre-ringing in linear-phase kernels) exceeds the normal temporal masking limits or smearing of transients (with minimum-phase filters) becomes an issue.

Designing sinc interpolator for non-integer ratio down sampling of band limited signal