KVR Audio

philstem · Post by **philstem** » Mon Nov 29, 2021 9:17 pm

The original PolyBLEP paper outlines two techniques for approximating the ideal bandlimited step function. I've seen code for the B-spline approach in a few places online. It's easy to implement and has some nice properties but it's not really a non-biased approximation for the bandlimited step function right? As we add more degrees, the function tends towards a Gaussian which has a Gaussian frequency response. I found that while increasing the degree improves aliasing, the sound also gets a lot duller. I haven't tried the Lagrange approach yet and I think there may be many more methods to try (cubic spline interpolation, Chebyshev polynomials, ...). Has anyone here made attempts at any non-B-spline PolyBLEPs?

mystran · Post by **mystran** » Mon Nov 29, 2021 10:06 pm

philstem wrote: ↑Mon Nov 29, 2021 9:17 pm The original PolyBLEP paper outlines two techniques for approximating the ideal bandlimited step function. I've seen code for the B-spline approach in a few places online. It's easy to implement and has some nice properties but it's not really a non-biased approximation for the bandlimited step function right? As we add more degrees, the function tends towards a Gaussian which has a Gaussian frequency response. I found that while increasing the degree improves aliasing, the sound also gets a lot duller. I haven't tried the Lagrange approach yet and I think there may be many more methods to try (cubic spline interpolation, Chebyshev polynomials, ...). Has anyone here made attempts at any non-B-spline PolyBLEPs?

My old PolyBLEP tutorial on this very forum (viewtopic.php?t=398553) I think uses a pair of "smoothstep" functions (ie. basically C1 cubic hermite splines) to approximate a band-limited impulse (which is then integrated as usual into a step). In that thread you can also find other approximations (with plots of spectra if they are still alive, sorry didn't check).

So yeah, you can use basically anything. Lagrange interpolators are "optimal" in terms of flatness around DC, but their discontinuities lead to poor behaviour near Nyquist (eg. spurious harmonics). BSplines I guess are optimal in terms of analytic smoothness, but tend to attenuate high frequencies quite a bit, so while the harmonics decay fast, they also decay early.

That said, if you're trying to fit a BLEP over just two samples, then there's not much point going at higher orders (ie. two cubic pieces of some sort for the base impulse is probably ideal) and if you're willing to use a longer kernel then something like windowed-sinc FIR BLEPs fairly quickly becomes more attractive than PolyBLEPs as the LUT-approach becomes faster than computing the polynomials on the fly (and the quality is better as well).

philstem · Post by **philstem** » Mon Nov 29, 2021 10:30 pm

mystran wrote: ↑Mon Nov 29, 2021 10:06 pm My old PolyBLEP tutorial on this very forum (viewtopic.php?t=398553) I think uses a pair of "smoothstep" functions (ie. basically C1 cubic hermite splines) to approximate a band-limited impulse (which is then integrated as usual into a step). In that thread you can also find other approximations (with plots of spectra if they are still alive, sorry didn't check).

Thank you! Your post history is impressive.

philstem · Post by **philstem** » Tue Nov 30, 2021 7:52 pm

I did a bit more research and there seems to be a family of functions f_n(x) = x^{n+1} \sum_{k=0}^n \binom{n+k}{k} \binom{2n+1}{n-k} (-x)^k of which the ramp function (half of the triangular function) and the smoothstep function are a part of (as f_0 and f_1). I plotted the first three functions plus the boxcar function and their frequency response up to 2*nyquist_freq.

https://en.wikipedia.org/wiki/Smoothstep

anti_aliasing_plots.png

mystran · Post by **mystran** » Tue Nov 30, 2021 9:03 pm

philstem wrote: ↑Tue Nov 30, 2021 7:52 pm I did a bit more research and there seems to be a family of functions f_n(x) = x^{n+1} \sum_{k=0}^n \binom{n+k}{k} \binom{2n+1}{n-k} (-x)^k of which the ramp function (half of the triangular function) and the smoothstep function are a part of (as f_0 and f_1). I plotted the first three functions plus the boxcar function and their frequency response up to 2*nyquist_freq.

Right. What we can sort of see in these plots is that essentially for a given fixed length it's essentially a tradeoff between high-frequency response (ie. ideally the frequency domain plot is a flat horizontal line at 0dB all the way to Nyquist) vs. aliasing attenuation (ie. ideally the response drops all the way to zero right after Nyquist). That's not specific to these kernels (or even BLEPs, but rather just any FIR interpolator) .. but something to keep in mind is that when you integrate the response to a step (eg. for a saw-wave), the effective aliasing at low frequencies is actually a bit lower than what the raw impulse plot suggests.

One can experiment with all kinds of choices for a specific polynomial kernel, but at the end of the day, it's the length of the kernel that's the main limiting factor and in practice any "reasonable" kernel results in fairly similar results, with all of them (even the boxcar approximation of an impulse) much better than the naive approach, even with the minimal 2-sample window... but at the same time the quality just won't compete with precomputed windowed sinc kernels with something like 32+ taps per branch (which for all intents and purposes start to approach "perfect" results if done well, but which is still quite reasonable in terms of performance on the desktop at least).

If you're oversampling (eg. by 2x) then it might make sense to prefer one with slightly better aliasing attenuation (as much of the high-frequency loss is in the oversampled band) where as if you're trying to run at 1x for performance over quality, then it might be preferable to pick a kernel that's a bit flatter in the passband (at the cost of slightly more aliasing) to avoid a "dull" sound.

philstem · Post by **philstem** » Sat Dec 04, 2021 9:57 pm

The kernel I ended up implementing uses four samples. This kernel's frequency response up to the Nyquist frequency is better than the triangle kernel's (-3dB vs -8dB at Nyquist). The frequency attenuation from Nyquist back to 0 is just slightly worse and back up to Nyquist it's the same or better compared to the triangle kernel.

I plotted the kernels, their frequency response and the first harmonics of a sawtooth wave at 1000 Hz with a sample rate of 44100 Hz convolved with these kernels (orange is non-aliased, blue is aliased). In the first image there is no anti-aliasing, in the second one the triangular kernel is used and in the third one the kernel I've chosen is used. For the fourth image I used an 8 tap windowed sinc filter with the kaiser window with parameter 14, just to show how much of a difference these longer kernels make.

Below is the kernel function integrated and with the unit step subtracted, so that it can be inserted at a jump using addition.

Code: Select all

float my_polyBLEP(float x) {
    if (x < -2.0f || x > 2.0f) return 0.0f;
    if (x < -1.0f) return (0.2f*x + 0.6f)*x + 0.4f;
    if (x < 0.0f) return (-1.6f*x - 2.6f)*x - 1.0f;
    if (x < 1.0f) return (1.6f*x - 2.6f)*x + 1.0f;
    return (-0.2f*x + 0.6f)*x - 0.4f;
}

anti_aliasing_plots.png

saw_freq_resp.png

mystran · Post by **mystran** » Sat Dec 04, 2021 11:12 pm

philstem wrote: ↑Sat Dec 04, 2021 9:57 pmFor the fourth image I used an 8 tap windowed sinc filter with the kaiser window with parameter 14, just to show how much of a difference these longer kernels make.

16 taps in single-precision will fit in a single 64-byte cache line on x86... and since the bulk of the cost for a precomputed BLEP is going to be cache misses, I'd usually just go for a multiple 16 (and align each branch to cache lines).

philstem · Post by **philstem** » Sun Dec 05, 2021 12:16 am

mystran wrote: ↑Sat Dec 04, 2021 11:12 pm 16 taps in single-precision will fit in a single 64-byte cache line on x86... and since the bulk of the cost for a precomputed BLEP is going to be cache misses, I'd usually just go for a multiple 16 (and align each branch to cache lines).

That makes sense. On x86 there should be no significant performance penalty going from 8 to 16 taps. Is it worth it to go beyond 16 taps though? At that size, almost all frequencies above Nyquist fall below the 24 bit noise floor (at least for the sawtooth wave) and I doubt many people could hear the difference when working with a sample rate of 44100 Hz or higher.

mystran · Post by **mystran** » Sun Dec 05, 2021 12:39 am

philstem wrote: ↑Sun Dec 05, 2021 12:16 am
mystran wrote: ↑Sat Dec 04, 2021 11:12 pm 16 taps in single-precision will fit in a single 64-byte cache line on x86... and since the bulk of the cost for a precomputed BLEP is going to be cache misses, I'd usually just go for a multiple 16 (and align each branch to cache lines).
That makes sense. On x86 there should be no significant performance penalty going from 8 to 16 taps. Is it worth it to go beyond 16 taps though? At that size, almost all frequencies above Nyquist fall below the 24 bit noise floor (at least for the sawtooth wave) and I doubt many people could hear the difference when working with a sample rate of 44100 Hz or higher.

I'm not entirely convinced your plots are correct (perhaps an gain-axis scaling issue)?

edit: As for "is it worth to" these questions always depend on the tradeoffs you're willing to make. I find that with 32-taps and 2x oversampling I can get the quality I want (where the 2x is more for allowing gradual transition in order to reduce IMD issues with further processing), but you could use more or you could use less, depending how much you value performance vs. quality.

philstem · Post by **philstem** » Sun Dec 05, 2021 1:11 am

mystran wrote: ↑Sun Dec 05, 2021 12:39 am I'm not entirely convinced your plots are correct (perhaps an gain-axis scaling issue)?

Right, here is a plot of my windowed sinc kernel using the kasier window with parameter 14. The window function is scaled in such a way that it would affect 8 samples to the left and 8 samples to the right of a jump. That is what 16 taps means in that context, right? At 130% of the Nyquist we get -135 dB (where dB means 20*log10(amplitude)), meaning that at 70% of the Nyquist (15.4 kHz at 44.1 kHz sample rate) or below, aliased harmonics can be at most -135 dB (assuming no harmonic is louder than 0 dB).

If my plots are wrong, do you see an error in my thinking here?

The assumption here is that a perfect interpolation method is used and of course that assumption can never be met.

anti_aliasing_plots.png

philstem · Post by **philstem** » Sun Dec 05, 2021 1:49 am

mystran wrote: ↑Sun Dec 05, 2021 12:39 am edit: As for "is it worth to" these questions always depend on the tradeoffs you're willing to make. I find that with 32-taps and 2x oversampling I can get the quality I want (where the 2x is more for allowing gradual transition in order to reduce IMD issues with further processing), but you could use more or you could use less, depending how much you value performance vs. quality.

Thank you for sharing! Did you find 16 taps insufficient?

mystran · Post by **mystran** » Sun Dec 05, 2021 1:52 am

philstem wrote: ↑Sun Dec 05, 2021 1:11 am
mystran wrote: ↑Sun Dec 05, 2021 12:39 am I'm not entirely convinced your plots are correct (perhaps an gain-axis scaling issue)?
Right, here is a plot of my windowed sinc kernel using the kasier window with parameter 14.

Ok, so there's two ways to parameterize Kaiser, one of them uses alpha and the other uses beta=pi*alpha... I'd guess we have beta=14 here?

meaning that at 70% of the Nyquist (15.4 kHz at 44.1 kHz sample rate) or below, aliased harmonics can be at most -135 dB (assuming no harmonic is louder than 0 dB).

The trouble is.. you'd really want your passband to be flat (eg. to within ~1dB) to at least 18kHz (or so) in order to avoid significantly dulling the high frequencies... and then you'd really want that ~100dB attenuation at 22kHz(!), because even if aliasing from the region immediately above is not typically that audible (well, depends on the individual), the moment you put it through some non-linear function those aliased harmonics will produce IMD non-sense products lower down.

Now, I'm not saying you have to worry about this.. and even 2-sample PolyBLEP might be the answer in some situations, but there are reasons why you might want to.

mystran · Post by **mystran** » Sun Dec 05, 2021 1:57 am

philstem wrote: ↑Sun Dec 05, 2021 1:49 am
mystran wrote: ↑Sun Dec 05, 2021 12:39 am edit: As for "is it worth to" these questions always depend on the tradeoffs you're willing to make. I find that with 32-taps and 2x oversampling I can get the quality I want (where the 2x is more for allowing gradual transition in order to reduce IMD issues with further processing), but you could use more or you could use less, depending how much you value performance vs. quality.
Thank you for sharing! Did you find 16 taps insufficient?

Well, there is an additional complication when it comes to higher order BLEPs: every additional round of integration puts some additional demands on the flatness of the kernel (at low frequencies)... and I find that with cubics it's hard to even get something sensible out of 16 taps.

.. but yeah, in general I find that with 16 taps you need to allow for a transition wider (or alternatively poor attenuation) than what I'd usually like.

philstem · Post by **philstem** » Sun Dec 05, 2021 2:01 am

mystran wrote: ↑Sun Dec 05, 2021 1:52 am The trouble is.. you'd really want your passband to be flat (eg. to within ~1dB) to at least 18kHz (or so) in order to avoid significantly dulling the high frequencies... and then you'd really want that ~100dB attenuation at 22kHz(!), because even if aliasing from the region immediately above is not typically that audible (well, depends on the individual), the moment you put it through some non-linear function those aliased harmonics will produce IMD non-sense products lower down.

Now, I'm not saying you have to worry about this.. and even 2-sample PolyBLEP might be the answer in some situations, but there are reasons why you might want to.

Perfect explanation, thank you

mystran · Post by **mystran** » Sun Dec 05, 2021 2:05 am

That said I really want to emphasize that I'm not trying to say "it's garbage if you don't do this" but rather that there are reasons to choose different trade-offs.. and ultimately one should look at the whole signal path, because there's really no point using 64 taps for a BLEP kernel if the next thing you do with the signal is put it through a high-gain waveshaper with little to no oversampling.. so it's more of a matter of "how good do I have to make this so it's no longer the limiting factor of my signal path" rather than "how good should I make this on it's own" and sometimes it makes sense to go for performance first and other times it makes sense to go for quality at all cost... and then sometimes it makes sense to choose something in between.

edit: ..and while I have some code that uses 32-taps with 2x oversampling, I also have some code (in a different application with different trade-offs) that uses 2-sample PolyBLEPs.

Non-B-Spline PolyBLEPs