Arbitrary ratio resampling

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Hi guys,
I'm getting crazy while trying to get an arbitrary ratio resampler to work.
Let's say I want to convert from the current sample rate (i.e. 44100) to 48000

What I'm doing is:

1) Calculate the ratio: original sr / dest sr (44100/48000 = 0.91875)
2) 4x Upsampling: 44100 -> 176400
3) Interpolation @ ratio: 176400 -> 192000
4) 4x Downsampling: 48000

The problems are that I'm not always getting the right amount of expected samples for the block, but even if try with a destination sample rate that will return me the exact amount of samples, I always get glitches.

Anyone tried something similar?

Thanks,
Luca

Post

yes, i have a plugin which runs at a fixed rate and has to resample to host rate in realtime
you have to introduce a delay somewhere

in my case, i had 3 different processing functions depending on the host sampling rate:
1) host rate below synth rate
2) host rate above synth rate
3) perfect match (no resampling is needed)
It doesn't matter how it sounds..
..as long as it has BASS and it's LOUD!

irc.libera.chat >>> #kvr

Post


Post

Audiority wrote:The problems are that I'm not always getting the right amount of expected samples for the block,
How are you calculating the expected number of samples per block? Filters and interpolators used in resampling will ring producing more samples. These extra tailing samples need to overlap adjacent blocks. Maybe you already know this, but without seeing the code, or even what kind of upsampling, downsampling and interpolation procedures you're using, it's hard to know where the problem is.

Post

Audiority wrote: 1) Calculate the ratio: original sr / dest sr (44100/48000 = 0.91875)
2) 4x Upsampling: 44100 -> 176400
3) Interpolation @ ratio: 176400 -> 192000
4) 4x Downsampling: 48000
This is sort of inefficient most of the time and you might get better performance by using sinc interpolation from one rate to the other directly... but like.. that's sort of orthogonal to your problem.
The problems are that I'm not always getting the right amount of expected samples for the block, but even if try with a destination sample rate that will return me the exact amount of samples, I always get glitches.
Right.. so every time you have two clocks, you will find that keeping them exactly in sync is impossible (although you can track the drift and try to adapt and/or adjust one of the clocks). Even floating point counters running at different rates will end up drifting overtime, thanks to rounding differences.

The other thing to remember is that if you have an integer number of input samples, you will usually have a fractional number of output samples. So if you're processing in blocks, the next block needs to pickup the output sample position (=fractional value) from the end of the previous block, or the block boundaries will be obvious.

Anyway, with regards to multiple clocks, your options are:

1. pick a master clock (eg. input) and let the other one run "whatever rate" (eg. you produce output when your fractional output sample counter happens to cross into the next actual sample); if you do this you don't know in advance how many samples you'll produce, but it works fine for batch-converting stuff.. or a synth could just produce enough samples that the output buffer is full

2. if you can control both clocks and your conversion rate is rational, you can use exact (eg. rational integer) arithmetics to count samples; just remember the number of samples you'll produce is still rational (and not integer)

3. if you can't control the two clocks (eg. converting between two devices or APIs), then you need to implement some form of variable oversampling, add some buffering and adjust the conversion factor slightly on the fly in order to counter the drift (which in general can vary); basically you try to keep a roughly constant number of samples in your buffer and then adjust the rate depending of whether you have too many or two few (using option 1 to produce the buffer contents)

Either way, multi-rate stuff with non-integer factors is just generally "very nasty" to get right and even once you get it working it will still drive you nuts every time you have to touch it. :)

Post

Maybe this is a trivial "side-issue" but I noticed looking in the past at some nice time/pitch stretch libs (same strategic issues as arbitrary ratio resampling), and some of the older resampling libs.

I'd have to go back and study to remember all the details, but it seemed many devs had many ways of skinning the cat. You might have "push" resampling or stretching, versus "pull" resampling/stretching, or other ways.

You might set it up so you tell the resampler "here are 4096 samples at the original samplerate, so give me however many complete samples at the new samplerate that you can manage". For non-integer ratios, at the very least the output would vary by 1 or 2 samples or whatever, because sometimes the "fractional sample pointer" ends the input buffer with a value < 0.5, and sometimes the fractional sample pointer ends the input buffer with a value >= 0.5 or whatever. Sometimes you can get "one last extra sample out of the input" and sometimes you can't.

Some other "pull" type arrangements-- These could sometimes be a puzzle to implement in realtime as well, but it was based on a hard-and-fast output buffer size, and I was often doing it to on-the-fly stretch/shift/SRC playback where the ASIO driver tells me exactly how many bytes it needs so I need to deliver it that many bytes at the sample-converted, shifted and/oir stretched rate.

It might go something like this after you have initted the resampler or shifter/stretcher to parms you want--

If the ASIO buffer wants 512 sample frames, I tell the resampler "give me 512 sample frames". The resampler replies with an error code meaning something like, "No can do. Feed me at least 1825 new sample frames and then ask me again". So then you send the SRC or stretcher however many samples it thinks it needs, and generally the second time you ask for 512 sample frame output it succeeds. Maybe rarely something wacky might happen and the silly resampler might come back, "No can do. Feed me at least 2 new sample frames and then ask me again". But if it would do that more than very rarely then it needs rewrite or debug.

When I would write my resamplers based on a model "kinda like that" I would just calculate the "exact number of input samples it ought to need" then add a few extra for slop to make sure I always had enough, then ask for that many input samples. It wasn't dangerous if I had maybe asked for 2011 samples but I really only needed to use 2004 of them-- I still have the left-over 7 samples in my resampler's FIFO buffer, so it just means next time I don't have to ask for quite as many input samples, and it all works out.

This was especially useful with some of the stretch/pitch libs. If you change the stretch/pitch parms, sometimes the first couple of buffer requests would want LOTS more input samples than "after the ball gets rolling". Therefore right after a stretcher parm change the buffer-to-buffer input might be drastic-- Maybe 12000 samples input the first time, 7000 samples input the second time, then it steadies down to a fairly steady input buffer size request thereafter.

Post

Apologies, further beating the horse. As best I recall my resampler object was "about the same" as described above but slightly rearranged so it was earier for me to get my head around it, using it at low level in a playback/rendering situation. For all I know a computer scientist might say, "No these are two entirely different things" but I'm not a computer scientist and it seemed to me just variations of the same thing, one of them perhaps a little easier to understand how to use, maybe a little faster executing.

Was arranged as a Query followed by a Demand.

The Query might ask the resampler object the equivalent of, "Based on your current settings and the number of samples that might be sitting in your input and output FIFO buffers, how many more input samples do you need in order to give me 512 output samples?"

The Query always has to be answered and it always has to be correct. For instance if the "exact number" might be 1000 new samples, then an answer in the ballpark of 1000, 1001, 1002-- A little slop but not much, is a correct answer. An answer less than 1000 would be wrong. An answer wildly in excess of what the resampler actually needs would be wrong. If a program is possibly dealing with midi input, user input, and automation beyond the scope of low-level playback, then it does not seem a good idea to render farther ahead of realtime than is absolutely necessary.

Then I go scrounge up 1002 samples or whatever and issue to the resampler object The Demand which also must always succeed-- Something like, "Here is a pointer to 1002 new input samples. Return me a pointer to 512 output samples, and make it snappy!" :)

Post

Thanks guys! That async sample rate conversion paper is excellent!

Some details about the methodoly I'm currently using. Let's say I have a buffer of 1024 samples at 44.1kHz
I have a ratio calculated by current sample rate / new sample rate

1) A 4x oversampler using an Hermite interpolator upsamples the signal giving me exactly 4096 samples
2) I calculate the expected samples for the next interpolator to be the 4096 samples / ratio. The estimated samples are below 4096
3) I feed a lagrange interpolator with the ratio and the expected samples. The functions returns me pretty much the same amount of output samples as the upsampled input (this feels very wrong)
4) I calculate the samples expected for the downsampler with the original number of samples (1024) / ratio.
5) I feed another Hermite interpolator with the new data and I get the decimated buffer.

The final number of samples looks right (although the number is rounded), but obviously something is going wrong with the lagrange interpolator.

Post

Hi Audiority, dunno if this is helpful or just "painfully too obvious to mention"--

Many years ago I used strategy similar. I had scrounged up a few different cpu-inexpensive ways to 2x upsample and downsample, and a moderately-expensive better way, Polyphase IIR halfband filter. I tested a bunch of combinations of those, measuring execution time and the amount of aliasing on a slow sweep test sine wave file. Some combinations had worst of both worlds, slow execution time and fairly heavy aliasing, and other combinations ran faster with the same or less aliasing, some ran slower but much cleaner, etc. Tried to pick the best 5 out of all the tested combinations to give the user some choices between "fast quality" and "best quality".

Anyway, for instance however one is doing it, maybe we oversample 8x, then fractional interpolate thru the 8x, then downsample powers of 2 however many times necessary. Depending on source and dest samplerates, the upsample might turn out to be 8x but the downsample is only 4x or whatever it turns out to need. Going from something like 192k to 22.05 k, maybe upsample 2x or not at all, fractional interpolate, then downsample 2x several times.

One detail, say if your original input buffer has N samples in the input array, zero-indexed-- InSamp[0], InSamp[1], ..., InSamp[N-2], InSamp[N-1]. InSamp[N] is considered the top of current data, and it points to the empty slot right above the last valid sample in the buffer.

You can get fractional-interpolated outputs all the way up to somewhere between InSamp[N-2] and InSamp[N-1]. If just doing integer up/downsampling 44.1 to 22.05 or whatever, maybe you could use that very last InSamp as well.

But for me it seemed cleanest procedure, avoid occasional bugs, to stop interpolating somewhere below InSamp[N-1], the last input sample in the buffer. Possibly with some ratios one might need to stop even sooner, can't recall.

So after outputting interpolated samples from InSamp[0] up to but not including the last sample InSamp[N-1], one of the last cleanup chores before returning from the function would be to copy our current InSamp[N-1] down to InSamp[0], because we need that last sample from our current buffer, to continue resampling when we start resampling the next buffer.

Also when we start our current buffer, maybe the fractional read pointer index = 0.xxx and it increases until we are done and the fractional index is probably some value [N-2].yyy. So along with copying our last sample down to the first sample preparing for the next buffer, we would subtract our fractional index so that "for the next time" the fractional index = 0.yyy or whatever, and we would continue the interpolation over gaps between input buffers without having burps, accidentally dropping or adding a sample or whatever..

Post

Trying to more-clearly explain (if it even makes a difference)--

Say if you plan to always input 4096 sample buffers. The resampler input buffer would be declared to a size of 4097, going from InSamp[0] to InSamp[4096].

On each new input buffer load, you would load 4096 new samples starting at InSamp[1]. And then interpolate from InSamp[0] upto but not including InSamp[4096]. InSamp[4096] is the "top hinge point" for interpolation and we get as close as possible without hitting it.

Then on exit, preparing for the next time, we copy InSamp[4096] to InSamp[0]. Then next time we load 4096 samples starting at InSamp[1]. and start resampling from InSamp[0], which is last buffer's final input sample "we didn't quite reach" last time.

Post

On a practical note, why use a polynomial interpolator for the fixed ratio resampling? All this is doing is applying a short filter kernel to the x4 up/down resampling. Attentuation of content above nyquist will be poor, resulting in aliasing. Why not just use a FIR filter for that? There are free online FIR filter designers you can use that will plot the frequency responses, so you can make an informed choice on the trade off between desired attenuation and kernel length

Post

Also rather than trying to troubleshoot multiple interpolation schemes at once I'd suggest working on a single fractional ratio interpoltor. Just going directly from 44.1 to 196, for example. Simpler to figure out one step at a time and then put it all together once individual components are working correctly.

Post

I agree with matt42, unless you need something quick'n'dirty and don't have a lot of time to experiment with stuff, it may be worth learning "good quality generic but possibly slow" methods to have in the toolshed when needed. Even if you decide to use something cheaper.

In this old book, around page 522 "The Table Method of Interpolation"--
http://sites.music.columbia.edu/cmc/cou ... berlin.pdf
Text some pages beforehand may be useful or maybe you already know it and don't need the info. I think I recall reading of what seemed to me similar/identical methods somewhere on JOS' pages and elsewhere, and Nigel Redmon and MyStran and others have discussed the method. Probably somewhere or the other there is sample code but I haven't looked lately.

Honestly some time over the years I should have coded up an object based on this just to have it in the weapons locker, but every time I looked at it, just seemed to want so many math operations did not see how it could be very fast. But computers are much faster nowadays and there is something positive to be said about something that "just works" and "works purt dern good" even if maybe it chews cpu cycles like candy. :) Or maybe coded the right way on modern processors, it would be lots faster than I'd expect.

Post

JCJR wrote:I agree with matt42, unless you need something quick'n'dirty and don't have a lot of time to experiment with stuff, it may be worth learning "good quality generic but possibly slow" methods to have in the toolshed when needed. Even if you decide to use something cheaper.
Indeed, but even if you decide to go cheap, it still doesn't make sense to use a polynomial interpolator at fixed ratio. If you really want the hermite kernel then you could just derive the coefficients at the fixed points and dispense with the polynomial calculations. Or just design a short FIR (I suppose every windowed sinc FIR is actually a look-up table of a windowed sinc interpolator from this perspective).
JCJR wrote:just seemed to want so many math operations did not see how it could be very fast. But computers are much faster nowadays and there is something positive to be said about something that "just works" and "works purt dern good" even if maybe it chews cpu cycles like candy. :) Or maybe coded the right way on modern processors, it would be lots faster than I'd expect.
The table is fast, surely faster than calculating a windowed sinc on the fly and possibly faster than a polynomial. It's the fact that this design is often used for high quality kernels, so using a kernel of 70 or so samples in length is of course going to a bit expensive. Perhaps using a large table size, large enough that fractional table positions don't need interpolating would be the most effecient table method?

Post

JCJR wrote:I agree with matt42, unless you need something quick'n'dirty and don't have a lot of time to experiment with stuff, it may be worth learning "good quality generic but possibly slow" methods to have in the toolshed when needed. Even if you decide to use something cheaper.
This quote highlights the main misconception about sinc-interpolation when it comes to resampling: the actual processing cost is likely to be less than what you would spend on your quick&dirty method.

In terms of madds, you expect a fixed 2x FIR-oversampling scheme to spend about twice the multiply adds compared to doing it one-pass with a fractional FIR (assuming linear kernel interpolation, with doubles the number of madds). If you go for 4x, the figure is 4 times as many madds. What this means in practice is that even if your fractional FIR is "possibly slow" it's STILL going to be faster than your "quick&dirty" method.

Post Reply

Return to “DSP and Plugin Development”