Fractional Resampling artifacts occur when upsampling (lowering pitch)

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

This is by no means optimized i just read about fractional windowed sincs and windowed sincs from dspguide. Most resamplers have artifacts when the pitch is really low soxR, quest for perfect resampler, some others i have tried like the lagrange in Juce. My question is do I interpolate further to handle low frequency resampling artifacts. I want about 8 octaves down. How do I appropriately set my cutoff in the windowed sinc maybe thats what i am missing?


Code: Select all

    std::pair<double,double> get_next_sample() {
        
        double outL = 0.0;
        double outR = 0.0;
        
        double sum = 0;
        double fraction = sample_position - (int)sample_position;
        for(int i = 0; i < (int)M; i++) {
            const double w = 0.42 - 0.5*cos(TWO_PI*cutoff*(i+fraction)/M) + 0.08*cos(4*M_PI*((i+fraction))/M);
            if(abs(i+fraction-M2) <= 0.000000001){
                SINC[i] = w*TWO_PI*cutoff;
            }else{
                SINC[i] = w*(sin(TWO_PI*cutoff*(i+fraction-M2))/(i+fraction-M2));
            }
            sum += SINC[i];
        }
        
        
        for(int i = 0; i < (int)M; i++){
            auto norm = SINC[i]/sum;
            

            float l = inL[(int)sample_position-i];
            float r = inR[(int)sample_position-i];
            
            outL += l * norm;
            outR += r * norm;
        }

        
        sample_position += pitch_ratio;
        if(sample_position >= len) {
            sample_position = 0;
        }
        
        return std::make_pair(outL, outR);
    }
    

Post

Extreme rate change one way or another is expensive if you want to maintain the same quality (no easy way around it really, other than perhaps precomputing), because the work basically doubles for each octave. If you go 8 octaves down that's basically 2^8=256 times the work [edit: see below, I'm being an idiot here], unless you are willing to degrade quality at least somewhat. The reason is that filter quality is roughly(!) a function of the filter length divided by the wavelength at cutoff and for each octave we go down we need the cutoff to go down by an octave too.. so to maintain the same quality the filter must become twice as long (though the problem can be transposed into scatters with fixed filter I think, but that's the same amount of total theoretical work and perhaps a bit slower in practice).

With your code there are at least two obvious issues: the window should NOT depend on the cutoff and the case where (i+fraction-M2) is approximately zero is wrong: the limit value for cardinal sine at zero is simply 1.0 (not TWO_PI*cutoff or anything like that, just 1.0).

Also computing the kernel on the fly like this is REALLY REALLY SLOW and in practice what you normally do is compute it once for some "high" number of "branches" (eg. 256 fractional positions is typically quite reasonable) and then interpolate those to approximate a continuous kernel.
Last edited by mystran on Sat May 18, 2024 4:29 pm, edited 3 times in total.

Post

Thank you makes sense yes saw the issues now check like this if(isnan(si) || isinf(si)){ instead . sample_postion += pitch_ratio then flooring the sample position and convolving with a branch doesnt make a difference?

basically

loop to M: inL[(int)sample_pos] * fraction_kernel is okay? do you usually interpolate any further ?

Post

mystran wrote: Sat May 18, 2024 2:25 pm Extreme rate change one way or another is expensive if you want to maintain the same quality (no easy way around it really, other than perhaps precomputing), because the work basically doubles for each octave. If you go 8 octaves down that's basically 2^8=256 times the work [edit: this might actually be a bit conservative, 'cos it sort of assumes single pass and we could probably use cheaper filter half way and then cleanup with a better one, etc...
Actually hold on.. I'm being an idiot here (perhaps because saturday evening), somehow having my brain all upside down.

Pitching down is not an issue 'cos that's equivalent to upsampling with cutoff relative to the source rate Nyquist. You just interpolate the fractional positions and cutoff remains fixed. It's going the other way that's problematic, trying to play a sample faster than recorded, 'cos then it's the target rate Nyquist that you need to honor.

I guess 8 octaves down might put some pressure on the quality of the kernel interpolation, so perhaps you need more than the usual 256 (or so) branches to avoid numerical noise becoming dominant, but like .. idk. Anything using polynomial kernels though will have issues here, you do need a proper sinc interpolator to avoid imaging.

Post

down an octave (up-sampling) confusing. I am going to try libresamplerate seems like an interesting library. What really blew me away is the sound of the Rossum Electro Music Assimil8or I don't know if you have had a chance to try but it sounds amazing specially at low frequencies that's kind of my goal, but for now I just want to have something working specially with floating point math fixed point i have never learned.

Post

One thing to keep in mind is that at half the original sampling rate there is a sharp (ideally brickwall) cutoff. Sharp cutoffs are audible as "ringing" though this isn't much of an issue at 20kHz+ 'cos our ears are terribly sensitive there (even before age sets in and basically everyone loses those frequencies even if they had no "clinically significant" high frequency hearing loss) and the timescales of the ringing are short ('cos short wavelength).. but pitching down brings the ringing to much more sensitive regions and stretches out the timescales, so it will probably be audible if the original sample had frequency content all the way to half the sampling rate.

To combat this, you basically can (1) use an EQ-filter or something to round out the transition so it's not quite a brickwall; a more gradual transition will be much less audible and (2) perhaps try to record anything that is supposed to be pitched down at the highest reasonable sampling rate (though there's some caveats here; some audio interfaces might be able to technically produce a 192kHz recording, but might only have something like 40-60kHz worth of "clean" bandwidth).

As for floating vs. fixed point math: on modern CPUs with fast floating point units it's usually slower to do things in fixed point except perhaps in some special cases. If you're on an embedded without hardware floating point or something, then fixed point makes more sense, but it's not terribly difficult: adds and subs work as usual, with multiply/divide you need to shift to readjust your decimal point... and that's basically all there is to it.

Post

Voxengo's r8brain is another excellent library

Post

cubic hermite spline sounds amazing much less work too

Post

Marvinh wrote: Wed May 22, 2024 7:57 am cubic hermite spline sounds amazing much less work too
A cubic hermite (eg. Catmull-Rom) spline can be thought of as a 4-tap FIR kernel and if we restrict ourselves to 4 tap kernels, it's arguably one of the better choices for interpolation: the frequency response is much flatter than say a B-spline, but it's C1 continuous at the end-points (unlike say Lagrange) and happens to have zeroes (notches) almost right in the middle of the sidebands..

It might not be "optimal" in any given sense (this is classic: https://yehar.com/blog/wp-content/uploa ... 8/deip.pdf), but if we restrict ourselves to 4 tap kernels it is a really nice trade-off overall.

You can do objectively much better with longer kernels and that's where windowed sincs come into play.. but if Catmull-Rom is good enough for you (and I emphasize: it is already WAY better than say simple linear interpolation) then .. go for it.

Return to “DSP and Plugin Development”