KVR Audio

Fire Sledge - Ohm Force · Sat Sep 18, 2004 11:28 pm

René wrote:Exactly. Oversampling is used in the impulse calculation (about 256x iirc), which is precalculated. There's no oversampling in the convolution process.

But how do you avoid aliasing when you pitch up sound ? Without oversampling and with resampling rate > 1, some samples may be "skipped", resulting in aliasing.

The impulse tables are calculated as standard windowed-sinc, but in lower quality modes (7 and 15 points) I've found that using a Remez exchange generated impulse results in better distribution of the noise across the frequency range.

I guess this is because Remez algorithm gets mad when number of points becomes too high (because of the impulse oversampling), resulting in precision loss in intermediate calculations.

'Point-connecting' interpolation methods (bicubic, lagrange, hermite, etc.) have the unwanted side effect of having exponentially more interpolation noise when frequency increases. Sinc has almost the same noise figures across all the spectra (besides the bandlimiting effect).

What are you calling interpolation noise ? The aliased spectrum images ?

-- Laurent

René · Post by **René** » Sun Sep 19, 2004 12:40 am

Guys, one thing that I don't understand. The first graphics shows a "hole" after 20.000hz. While all others shows a lower value for those. Doesn't that means that Sinc-Interpolation is limiting the HZ ceiling while others just mirror the information? Is that a good or bad thing? Hope that I explained it right.

What that means is, that you've been posting over and over without having the most minimal clue what the whole thing is about.

Ok. As I never explained it here, I'll do my first attempt. I'll try to put it in 'wife-understandable' terms this time.

As we all know, it all starts with a test sine wave of 15kHz. The first note on the graphic shows the -untransposed- sample playing. It must be perfect, as the thing is just playing a wav file. No interpolation going on, as long as we play it at the original samplerate. Naturally, we can screw things up, but that's not the most common case.

To transpose the sample up, we need to drop some samples and recalculate the ones we play based in the original ones. To transpose it down, we need to add 'artificial' samples, also calculated from the existing ones. So far so good.

Now, HOW we do those 'new sample' calculations will define what we get. Let's examine the result after one-semitone transposing.

In all interpolator graphics, we can see apart from the transposed line (the one with the same color as the first), multiple horizontal lines appearing.

Let's call those 'components' for now (as they don't completely match the definition for 'aliases'). The amplitude of the component is given by the color.

Yellow means high amplitude, then it goes orange/red/pink/blue. So blue lines are low amplitude components (the ones you hear less).

Ideally, we would like to get only the transposed line. So all components are unwanted. As many of the components are not in the original wave call them 'interpolation noise', in oposition to 'aliasing', which we reserve for the ones I'll describe now:

If we continue transposing the sample up, it will pass a very important barrier called Nyquist. That guy ruined our lifes enunciating that we cannot represent any frequency greater than Nyquist frequency, which is the half of the samplerate.

If we continue transposing our original sample, there will be a point when the 15kHz frequency passes Nyquist. Here's a table for the 12-tone transposition with all the corresponding frequencies:

st freq
0 15000
1 15892
2 16837
3 17838
4 18899
5 20023
6 21213
7 22475
8* 23811
9* 25227
10* 26727
11* 28316

Note that the last four notes are above Nyquist limit, which for samplerate=44100 is 22050 Hz. If you check this graphic, for insance:

http://www.rgcaudio.com/images/gs.gif

You'll see that the first 7 semitones are played perfectly as expected. However, the next four semitones, intead of playing high and high get 'mirrored' against Nyquist. So for the 8th semitone, supposed to be 23811Hz, we get a component at 20289. This is called an 'alias' of the pretended waveform, and we don't want it. It's our enemy.

Why it's the enemy? Supposedly, if the system can't represent the 23811Hz component, it should 'erase' it. The mirroring operation it actually does makes low tones appear when high tones are expected. The higher you go in frequency, the lower it gets mirrored.

We call the process of avoiding the aliases 'bandlimiting' the interpolator. The idea is to 'filter' the frequencies which would alias -before- we perform the transform, so they don't get aliased.

Both issues, the interpolation noise and aliasing are not exactly the same thing. They are very related, though. All 'point connecting' interpolation modes (linear, lagrange, hermite, bicubic, watte tri-linear, parabolic) are NOT bandlimited, unless we perform oversampling (gol has described his implementation of oversampled hermite early in this thread).

They offer a variable interpolation noise, generally speaking lower as of higher order is the polynomial order we use.

So, in few words, the answer to

"Doesn't that means that Sinc-Interpolation is limiting the HZ ceiling while others just mirror the information?"

Is yes.

And the answer to:

"Is that a good or bad thing?"

Is "It's a bad thing".

But how do you avoid aliasing when you pitch up sound ? Without oversampling and with resampling rate > 1, some samples may be "skipped", resulting in aliasing.

A new impulse is used everytime the pitch changes. In the end, it's a plain and simple sinc-interpolator, just using tables to speed it up. The principle of the interpolator is very commonm and used in several hardware samplers. This paper describe the basics of it very well (I believe):

http://ccrma.stanford.edu/~jos/resample/

I guess this is because Remez algorithm gets mad when number of points becomes too high (because of the impulse oversampling), resulting in precision loss in intermediate calculations.

That's what I think as well. Emu synths (and creative cards) do use Remez in a 7-point fir interpolator. They get a (in my very personal opinion) VERY good quality vs resouces ratio. If it is of your interst, take a look at the usa patent 5,111,727 "Digital sampling instrument for digital audio data".

What are you calling interpolation noise ? The aliased spectrum images ?

There's some concensus on reserving 'aliasing' for frequencies that get mirrored on nyquist (even rolling back multiple times). As some interpolators introduce unrelated components, those others are called 'interpolation noise'. But this nomenclature is still matter of discussion (perhaps you recall we discussed it during a whole week in #musicdsp), as there's no precise indication that an 'alias' is -only- a mirrored component. Semantics stuff I think.

-René

WilliamK · Post by **WilliamK** » Sun Sep 19, 2004 12:44 am

René wrote:What that means is, that you've been posting over and over without having the most minimal clue what the whole thing is about.

Ohh, that's very nice from you Rene. Sorry, but you just didn't need to be rude.

I was asking, at least I'm trying to learn. Anyway, thanks for the explanation, it sure helps.

Wk

René · Post by **René** » Sun Sep 19, 2004 12:49 am

René, it's pity I can't read Intel SIMD asm...

Certainly. Man, when you'll be a real man and get a PC

(j/k)

Ok. Here's the uberbasic code which will perform the sinc interpolation. I consider this unusable (except for SLOOOOOOOOOOOOOOOOW renderings), but it's good for test comparisons (to access it in sfz, it's required to press the shift key while inserting the plugin then invokingn the menu. Or it was control?).

Code: Select all

//-------------------------------------------------------------------------------------------------
// f_interpolate_sinc_perfect ()
//
// Mono, floating-point perfect sinc interpolation (testing purposes)
//-------------------------------------------------------------------------------------------------
__inline void _fastcall sfzLayer::f_interpolate_sinc_perfect (const int iLength)
{
	// uncomment to force sinc size
	// SINC = 4;

	for (int i = 0; i < iLength; ++i)
	{
		const int iPos = int (m_dPosition);
		const double dDif = m_dPosition - double (iPos);
		const double dRatio = min (1., 1. / m_dRatio);

		const int iStart = iPos - (SINC - 1);
		const int iStop = iPos + SINC;
		const int iSinc21 = SINC * 2 - 1;
		float out = 0.;
		
		for (int i = iStart; i < iStop; ++i)
		{
			double d = dDif - double (i - iPos);
			
			if (d == 0)
				d = 0.0000001;

			double wval = i - iStart + 1 - dDif;
			double w = (0.42 - 0.5 * cos (DBL_2PI * wval / iSinc21) + 0.08 * cos (2 * DBL_2PI * wval / iSinc21));
			double Pid = DBL_PI * d;

			out += m_pfWaveL[i] * float (sin (Pid * dRatio) / Pid * w);
		}

		*m_pfGeneratorL++ = float (out);	
		m_dPosition += m_dRatioMod;
	}
}

Then this implementation is moved to the table-based one:

Code: Select all

//-------------------------------------------------------------------------------------------------
// f_interpolate_sinc ()
//
// Mono, floating-point sinc interpolation (no sse)
//-------------------------------------------------------------------------------------------------
__inline void _fastcall sfzLayer::f_interpolate_sinc (const int iLength)
{
	// precalculate all positions
	// for this block
	for (int i = 0; i < iLength; ++i)
	{
		g_dPos[i] = m_dPosition;
		m_dPosition += m_dRatioLast;
		m_dRatioLast *= m_dRatioInc;
	}


	// convert positions to integer
	// by using cvttsf2si if available
	dsp.DoubleToInteger (g_iPos, g_dPos, iLength);


	// calculate table indexes
	for (i = 0; i < iLength; ++i)
	{
		fDif[i] = float (IFACTOR * (g_dPos[i] - g_iPos[i]));

		int iPos = dsp.Trunc (fDif[i]);
		iTPos[i] = SINC2 * iPos;
		fTDif[i] = fDif[i] - iPos;
	}


	// interpolate block
	for (i = 0; i < iLength; ++i)
	{
		float* y1 = &ssrt32[iTPos[i]];
		float* y2 = &sdif32[iTPos[i]];

		int iStart = g_iPos[i] - SINCm1;
		int iStop = g_iPos[i] + SINC; 
		
		float fOut = 0;
		for (int s = iStart; s < iStop; s++, y1++, y2++)
			fOut += m_pfWaveL[s] * (*y1 + *y2 * fTDif[i]);

		*m_pfGeneratorL++ = fOut;
	}
}

This gets used when there's no SSE available. The cvttsf2si is only used when SSE2 is available, so it isn't used in this routine either.

The first routine I posted is just an implementation of this latest using SSE. Hope this makes it clearer.

-René

Urs · Post by **Urs** » Sun Sep 19, 2004 12:58 am

Thanks René!

Coincidently, I'm, just back from the pub, so I better don't try to go through this.

Anyway, I was just kidding

- will be interesting to find out how you optimized it, tho... tomorrow...

Cheers,

Urs

Scr1pt3r · Post by **Scr1pt3r** » Sun Sep 19, 2004 1:18 am

Urs wrote:Thanks René!

Coincidently, I'm, just back from the pub, so I better don't try to go through this.

Anyway, I was just kidding - will be interesting to find out how you optimized it, tho... tomorrow...

Cheers,

Urs

hmm... in a year from now it'll be :

Urs wrote:Thanks René!

Coincidently, I'm, just back from the AA meeting ( Alcholist Anonymous ) ...

tony tony chopper · Post by **tony tony chopper** » Sun Sep 19, 2004 4:45 am

note that if you don't want to bother with tables for the sinc, but still want to keep the speed realistic, you can get rid of the slow sin & cos in the loop by using the goertzel algorithm, precomputing a lot outside the loop and generating the sines out of simple mul & adds (it worked for me).

Urs · Post by **Urs** » Sun Sep 19, 2004 9:11 am

Hiya René,

hehehe, it looks much more readable now!

So, basically you have precomputed tables of windowed sinc convolution kernels. You use a floating point sample position accumulator (64 bit I guess). For each sample you pick a sinc table (2, actually) for the appropriate fractional part of the sample position and do the convolution. That's straight forward...

Hehehe, linearly interpolating between two tables is an insanely great idea!

Hmmm... some thoughts:

You could cut down the memory usage for N tables if you take into account that tables 0 - (N/2 - 1) *could be made* identical with tables N/2 - N, just reverse order.

I don't know what float <-> int conversions cost in SSE. Maybe you could use fixed point instead?

Cheers,

Urs

Urs · Post by **Urs** » Sun Sep 19, 2004 10:00 am

gol wrote:note that if you don't want to bother with tables for the sinc, but still want to keep the speed realistic, you can get rid of the slow sin & cos in the loop by using the goertzel algorithm, precomputing a lot outside the loop and generating the sines out of simple mul & adds (it worked for me).

Hehe, if tuning the sinc exactly at Nyquist is still good enough, you don't even have to compute much for the sine part. It just alters between two numbers for odd and even samples

Hmmm, IIRC the Goertzel algorithm does do much the same thing as the Chamberlin filter. But the tuning of a Chamberlin gets weird when approaching Nyquist. Does this also affect Goertzel in any way (i.e. rounding errors or something)?

Cheers,

Urs

René · Post by **René** » Sun Sep 19, 2004 12:42 pm

note that if you don't want to bother with tables for the sinc, but still want to keep the speed realistic, you can get rid of the slow sin & cos in the loop by using the goertzel algorithm, precomputing a lot outside the loop and generating the sines out of simple mul & adds (it worked for me).

I've done several versions replacing sin() and cos () (for the non-nerds: sin() and cos() are the standard c library sine and cosine functions respectively, and they're slow as hell. So if we use them, CPU goes waaaaaay up).

However, I couldn't find a non-lookup version which is still useable realtime. I get a 'faster rendering version' at max. I'd really like to see that non lookup implementation you mentioned.

What I consider 'useable realtime' is something that allows me do the same as an average hardware sampler: playing 48~64 voices realtime before the CPU explodes. If I can't play a decent piano realtime, it's not for me. I feel it like what CSound is to a VSTi.

You could cut down the memory usage for N tables if you take into account that tables 0 - (N/2 - 1) *could be made* identical with tables N/2 - N, just reverse order.

Indeed. If memory were a problem, the linear interpolation could be done on-the-fly, reducing the size to a half as well. Note that I'm using two arrays, to simplify this:

y1 + (y2 - y1) * x

y1 is in one array (ssrt32) and y2-y1 is in another (sdif32).

However, the whole set of tables requires about 2.5Mb for each sinc-size, and as tables are const static the memory is taken for the first instance only.

I don't know what float <-> int conversions cost in SSE. Maybe you could use fixed point instead?

SSE has a slight advantage over fixed point when block size is 64 or higher, and depending what operation to consider.

When I started this project, Intel was releasing SSE2, then SSE3, and I was hoping they to be like in SSE143 by now, and on processors of 5GHz. SIMD applied to block processing was what I thought (and they advertised) as the most promising way to go back then.

However, Moore's law has been violated this time, and the processors development has been severely affected by... something, perhaps world economics, I don't know. I got a 3GHz processor two years ago, and the fastest today is what? 3.6GHz? pfff "powerful gamer, meet powerful processor" blah

Let's see how this continues.

-René

WilliamK · Post by **WilliamK** » Sun Sep 19, 2004 12:46 pm

René wrote:for the non-nerds: sin() and cos() are the standard c library sine and cosine functions respectively, and they're slow as hell. So if we use them, CPU goes waaaaaay up).

Sorry if this is a stupid question. But couldn't be used "aproximation sin" instead?

http://musicdsp.org/archive.php?classid=5#175

I found that for my filters this works perfectly. But I don't know for other cases.

Also, about SSE. It will only give you a lot of speed advantages when using in paralel. Using SCALAR is a waste of resources, as you could be processing 4 numbers at the same time instead. Wusikstation, for example, process voices in blocks of 4. So you get 4 for the price of 1.

Wk

WilliamK · Post by **WilliamK** » Sun Sep 19, 2004 12:58 pm

I did a very simple test. Rendered the test with Hermite interpolation but with a sample rate of 96khz. And I used Sony SoundForge 7.0 to downsample to 44khz. The result was much cleaner than just rendering at 44khz directly. Its a nice solution for synths that doesn't have high-quality options. It even helps with linear-interpolation. Just a quick handy fix.

http://www.wusik.com/images/HermiteAt96khz.jpg

It's still noisy, but yet, much better compared to 44khz rendering.

Wk

Urs · Post by **Urs** » Sun Sep 19, 2004 1:05 pm

WilliamK wrote:Wusikstation, for example, process voices in blocks of 4. So you get 4 for the price of 1.

In theory, yes. But practically, the performance bottlenecks are the slow busses compared to the processor clock and the long pipelines of the cpus. Hence, pure SIMD code is typically slowed down by both, and there are applications (i.e. filters with a lot of dependencies between instructions) where it doesn't bring much more than unreadable code.

On the other hand - and I can only speak for AltiVec - there are SIMD instructions available that can make code fly even if used in a skalar manner. In AltiVec for example, there are tricks to load registers with many, many values without accessing ram, and a float <-> fixed conversion takes a latency of 1 cycle.

Also - PowerPC specific, again - instructions in the several units (integer, floating point, SIMD) can be exectuted in parallel, so that while you get pipeline stalls (uhm, divisions come to my mind), you can still feed another unit.

So, in my observation, it's always good to follow mixed approaches (but don't pass registers from one unit to the other directly, always pass them over the data cache...). This of course needs a good organization of your code, but this is what it's all about anyway, huh?

Cheers,

Urs

René · Post by **René** » Sun Sep 19, 2004 1:24 pm

The success of SIMD (at least in Intels, but I'm pretty sure it's the case for all processors) does not depend on if you process 'vectors' or 'scalars'. That's only a partial way to see it. It all depends on how you code the thing.

You can think on single-channel convolution as a scalar task. However, you can get a 50~100% CPU speed up by using SIMD to do the convolution in 'chunks' (like in the code above). Actually, one of the sample snippets in the intel site for SSE/SSE2 is a single-channel convolution routine.

-René

arguru · Post by **arguru** » Sun Sep 19, 2004 2:09 pm

Here's another implementation of sinc resampler. Is basically equivalent to Rene's code but no optimizations here, simple as it looks, uses real trigs, plain x87 (no simd), double pres, sinc is shaped thru a blackman window and is slow-as-a-drunk-turtle-taking-a-curve obviously.

At 512 taps points gives pleasure for the audiophile quality, it does pee and poo over nyquist statement and a gang-rape for the cpu.

Code: Select all

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
float sinc_interpol_mono(float* psamples,int const int_pos,double d_fract_pos,double const d_resample_step,int const num_taps)
{
	// pi constant
	double const k_pi=atan(1.0)*4.0;

	// mix accumulator
	double d_mix=0.0;

	// avoid sinc fractional part division by 0
	if(d_fract_pos==0.0)
		d_fract_pos=1.0e-25;

	// compute sinc cutoff (tuned to nyquist)
	double c_cutoff=1.0/d_resample_step;

	// clamp sinc cutoff
	if(c_cutoff>1.0)
		c_cutoff=1.0;

	// get half taps
	int const half_taps=num_taps/2;

	// FIR convolution
	for(int s=0;s<num_taps;s++)
	{
		// get wave index centered around half
		int const i_tap_centered=s-half_taps;

		// sinc get pos
		double const d_sinc_x=-double(i_tap_centered)+d_fract_pos;
		
		// sinc get value
		double const k_sinc=sin(c_cutoff*k_pi*d_sinc_x)/(k_pi*d_sinc_x);

		// blackman window value
		double const k_w=2.0*k_pi*d_sinc_x/double(num_taps);
		double const k_window_b=0.42+0.5*cos(k_w)+0.08*cos(2.0*k_w);

		// add mix convolution
		d_mix+=double(psamples[int_pos+i_tap_centered])*k_sinc*k_window_b;
	}

	// return
	return float(d_mix);
}

[/code]

The aliasing thread