KVR Audio

Andrew Souter · Post by **Andrew Souter** » Sun Aug 19, 2018 8:54 pm

earlevel wrote:
Andrew Souter wrote:
earlevel wrote:
2048 or 4096 are good choices for audio rate, the main reason to go lower is so you can sweep a bright waveform (sawtooth) to sub audio without the bandlimited nature becoming apparent.
...however a "bright" waveform such as a raw saw or sqaure, has a 1/f amplitude ratio, so the 1024th partial is down ~60dB anyway... and one could project that the average waveform has even less energy at the 1024th partial.

2048 samples, 1024 partials, is enough 99% of the time...
I'm not sure whether you got my point—you seem to have misinterpreted it, but I might be wrong.

What I meant was...play this on your good monitors or headphone (unfortunately, it's mp3 for better browser compatibility, but good enough to get the point across):

http://www.earlevel.com/main/wp-content ... 48-20s.mp3

By 5 seconds you're already losing apparent harmonics, by 8, 9, 10 it sounds like you're sweeping a lowpass filter down with it...A real sawtooth would not get dull as you dropped.

But here's with 32k tables:

http://www.earlevel.com/main/wp-content ... mp-20s.mp3

As I said, 2048 is a good minimum number. I like going another octave with 4096, gives full bandwidth to 20 Hz, and allows a bit lower before crapping out.

ya, I got the point. i know a little about this topic:

https://www.galbanum.com/products/archi ... eforms2010

there was an earlier thread here talking about "inharmonic waveforms" where I mentioned there really is no such thing, but the best compromise might be to create waveforms with lots of high frequency partials and then transpose playback down several octaves.... in this kind of very extreme application where the fundamental freq might be 1hz or 0.1hz or even much lower AND there is a lot of energy in the very high partials, THEN it might be desirable to have more than 2048 samples... but this is a rather extreme usage case, and is probably better accomplished with other synthesis/DSP methods if this kind of chaotic texture thing is the goal. (seemed really cool to me 18 years ago when I first tried it with waveforms made in MetaSynth -- which is what started me on the waveform journey... but the various interpolation artifacts and aliasing there was partially responsible for it's coolness in this case.)

I think for pitch bending saw wavs (even by extreme amounts) 2048pt is plenty... you could always ad some waveshaping to regenerate higher harmonics if it was a concern...

note for LFO use, very long waveforms can be cool. I used 65536pt in this:

https://www.galbanum.com/products/piscis/

(sadly rapture is no more I guess. too bad. it had a very nice waveform/wavetable playback engine. the cleanest of its day AFAIK.)

but if you are using something that long you had better have some energy in the extreme partials, otherwise with proper interpolation, there is no point and it is just a waste of disk space...

Andrew Souter · Post by **Andrew Souter** » Sun Aug 19, 2018 9:08 pm

...now if you were talking about a synth engine that used some kind of MIP MAP technique and you played your saw at A = 3520hz, with a sample rate of 44.1k, (allowing only 6 partials at that fundamental) and THEN you pitch-bend that down several octaves, WITHOUT handling it intelligently, THEN that would be a problem, yes...

but this is an implementation detail of the synth engine more than a question of is 2048 samples really enough. Your synth engine should basically cross-fade to the appropriate waveform in the MIP MAP when undergoing pitch bends....

Marvinh · Post by **Marvinh** » Tue Aug 28, 2018 12:35 am

edit: something like this

Code: Select all

  double m = v->mixer1.getNextValue();
                    double output = 0.0;
                   if(m <= 1.0)
                   {
                       output = (1-m)*ou + m*tr;
                   }else if (m > 1.0f &&m <= 2.0)
                   {
                       output = (2-m)*tr+ (m-1)*va;
                   }else if(m > 2.0f &&m <= 3.0)
                   {
                       output = (3-m)*va + (m-2)*sq;
                   }

edit 2: VCV Rack version

Code: Select all

inline float crossfade(float a, float b, float frac) {
	return a + frac * (b - a);
}

/** Linearly interpolate an array `p` with index `x`
Assumes that the array at `p` is of length at least floor(x)+1.
*/
inline float interpolateLinear(const float *p, float x) {
	int xi = x;
	float xf = x - xi;
	return crossfade(p[xi], p[xi+1], xf);
}

Does anyone have any idea of how you morph or crossfade between tables like table 1 to table 2 to table 3. has this been discussed yet?

JCJR · Post by **JCJR** » Tue Aug 28, 2018 8:04 pm

The dumb question I think I know the answer to, just want to verify--

For a sine wavetable oscillator, wouldn't just one big sin table of size 32k or 64k samples, rendered with appropriate interpolation, be "good to go" for everything from LFO frequencies up to nyquist?

mystran · Post by **mystran** » Tue Aug 28, 2018 8:39 pm

JCJR wrote:The dumb question I think I know the answer to, just want to verify--

For a sine wavetable oscillator, wouldn't just one big sin table of size 32k or 64k samples, rendered with appropriate interpolation, be "good to go" for everything from LFO frequencies up to nyquist?

Yeah, except you probably don't need anywhere near that big of a table, since we're talking about a relatively smooth shape. Something like 1k is probably quite fine for this.

earlevel · Post by **earlevel** » Wed Aug 29, 2018 12:39 am

JCJR wrote:The dumb question I think I know the answer to, just want to verify--

For a sine wavetable oscillator, wouldn't just one big sin table of size 32k or 64k samples, rendered with appropriate interpolation, be "good to go" for everything from LFO frequencies up to nyquist?

What mystran said, but to add that I think a table as small as 512 will give you a SNR (97 dB) equivalent to the dynamic range of 16-bit audio, and 12 dB more for each doubling of the table size, if I figured right.

mystran · Post by **mystran** » Wed Aug 29, 2018 1:23 am

earlevel wrote:
JCJR wrote:The dumb question I think I know the answer to, just want to verify--

For a sine wavetable oscillator, wouldn't just one big sin table of size 32k or 64k samples, rendered with appropriate interpolation, be "good to go" for everything from LFO frequencies up to nyquist?
What mystran said, but to add that I think a table as small as 512 will give you a SNR (97 dB) equivalent to the dynamic range of 16-bit audio, and 12 dB more for each doubling of the table size, if I figured right.

Dithered 16-bit can do a bit better than 97dB, but ... yeah.. that's the general idea.

earlevel · Post by **earlevel** » Wed Aug 29, 2018 2:30 am

mystran wrote:
earlevel wrote:
JCJR wrote:The dumb question I think I know the answer to, just want to verify--

For a sine wavetable oscillator, wouldn't just one big sin table of size 32k or 64k samples, rendered with appropriate interpolation, be "good to go" for everything from LFO frequencies up to nyquist?
What mystran said, but to add that I think a table as small as 512 will give you a SNR (97 dB) equivalent to the dynamic range of 16-bit audio, and 12 dB more for each doubling of the table size, if I figured right.
Dithered 16-bit can do a bit better than 97dB, but ... yeah.. that's the general idea.

Actually, I was comparing the SNR of a 512 sample sine lookup table (~97 dB) to the dynamic range of 16-bit audio (~96 dB). Not the perceived dynamic range of 24-bit audio dithered to 16-bit, for instance. I understand the confusion, perhaps I should have left off "audio" and just said 16-bit samples, but it's correct either way—dither doesn't change actual dynamic range of samples, just perceived dynamic range of musical signals.

What I was getting at is that if you calculated the purest 16-bit sine wave signal you could, at full scale, every sample as close as 16-bit will allow to perfect, it would be about the same as doing the same thing via a 512-sample lookup table.

But there's another reason to not compare with a dithered signal. Dither level is referenced to full scale, full potential signal (it's just as loud when the music is -50 dB as it is at -3 dB). The error I'm talking about is referenced to the actual signal level, always 97 dB down from it. We typically don't play a sine at full scale. So if you're doing 24-bit or 32-bit-float audio in the computer, and say a sine at -20 dB (still real loud), the error is now -117 dB relative to full scale (and being masked by the signal to boot). Which is to say, even as part of a dithered audio result and the cleanest recording you could do, you're just not going to hear an improvement on a 512-sample sine table by going to a large table size.

mystran · Post by **mystran** » Wed Aug 29, 2018 4:52 pm

earlevel wrote: What I was getting at is that if you calculated the purest 16-bit sine wave signal you could, at full scale, every sample as close as 16-bit will allow to perfect, it would be about the same as doing the same thing via a 512-sample lookup table.

Yes, but if you compute the purest sine wave possible in floating point (or wider integers, whatever), then dither it down to 16-bit, you can have a better SNR, even though full-scale sine waves are obviously rare in practical music and one can argue that for purposes of making music even 512 samples might be excessive. In practice, where you draw the line is somewhat of a matter of taste, but that doesn't change the fact that you almost certainly don't need 32k samples in your table if it's just for the purpose of sound generation.

As far as masking goes... it depends where the error energy is in terms of spectra. The main harmonic you probably should worry about in this case is the first (and second) image of the fundamental. If the table is too short, then linear interpolation might not adequately attenuate this image which (for practical table sizes) can be quite far from the fundamental and not necessarily masked by the sine itself, especially if it's also aliased to a non-harmonic frequency. This problem can actually be found in table-driven sine-wave oscillators found in the wild (and it's somewhat annoying when you hit this when you're trying to produce test signals), but one can argue whether it's actually a problem in typical music production (probably not).

Still, I'd recommend just writing a wavetable implementation that let's you easily experiment with the actual table size, then try something, put it through the speakers/headphones and spectrum analyser, then tweak it up or down until you find the point that is "good enough" for your purposes. It's always better to just test and measure than to believe what some random person on the internet claims.

earlevel · Post by **earlevel** » Wed Aug 29, 2018 5:32 pm

mystran wrote:Yes, but if you compute the purest sine wave possible in floating point (or wider integers, whatever), then dither it down to 16-bit, you can have a better SNR...

As far as masking goes... it depends where the error energy is in terms of spectra...

Again, with dither it's perceived SNR, and I've been referring to actual SNR. Even so, the perceived improvement occurs only in situations where the signal is extremely quiet—the only reason we dither is for extreme low signal levels. And the scenario you're discussing as having a theoretical improvement is when the signal is very loud. So, even theoretically, it's still not an improvement.

The brief "masking" comment was simply to point out the noise 97 dB down was further only available while the signal was present. It's not possible for someone to hear it even if you added gain to boost the noise floor (because the clipped signal would still mask it). No sense talking about the spectrum of the error, it's just not loud enough under any circumstances to hear.

Anyway, not intending to argue, just thinking the logic through "out loud". I know we agree on the practicality.

So, the answer of table size is 2^(n/2) samples, where n is the sample size you want to preserve (for 16-bit audio, 2^8 = 512 table). I actually didn't know this before so it was fun thinking about it, thanks to Jim's question.

JCJR · Post by **JCJR** » Wed Aug 29, 2018 6:36 pm

Thanks Nigel and mystran for the good insights.

mystran · Post by **mystran** » Wed Aug 29, 2018 7:00 pm

earlevel wrote:
mystran wrote:Yes, but if you compute the purest sine wave possible in floating point (or wider integers, whatever), then dither it down to 16-bit, you can have a better SNR...

As far as masking goes... it depends where the error energy is in terms of spectra...
Again, with dither it's perceived SNR, and I've been referring to actual SNR.

It depends on how you measure SNR. In my humble opinion the only meaningful metric with regards to audio is to look at the spectrum and measure the peak magnitude vs. noise floor. In this test even pure white dither objectively improves your SNR.

earlevel · Post by **earlevel** » Wed Aug 29, 2018 10:21 pm

mystran wrote:
earlevel wrote:Again, with dither it's perceived SNR, and I've been referring to actual SNR.
It depends on how you measure SNR. In my humble opinion the only meaningful metric with regards to audio is to look at the spectrum and measure the peak magnitude vs. noise floor. In this test even pure white dither objectively improves your SNR.

That's an appropriate definition, but it's even worse for your argument. Think about it. You want to dither higher resolution audio to 16-bit. You add noise. What level? Louder than -96 dB by several dB, obviously, or it doesn't dither. It depends on the dither type, but plus/minus a half lsb is the very least you can do, and that's for yucky rectangular dither. That's your minimum noise floor for your audio from there on out, about -91.5 dB for TPDF (much worse for shaped, but then we talk about the noise spectrum).

Again, dither helps the perceived dynamic range, not actual SNR—it lowers SNR. But it only does that perceived improvement when there is almost no signal at all. That's why it's basically unrelated to the topic of sine table SNR, where the noise is always that far below whatever level the signal is, unlike dither which is always constant relative to full scale. Plus, any benefit dither gives you, it also give you in playing back the sine table (want to play a sine at -100 dB from 16-bit?).

Triplefox · Post by **Triplefox** » Tue Sep 11, 2018 12:30 am

I have done a few wavetable implementations and here are some facts I can recall off the top of my head:

The quality of the generated tables matters a lot more than their size. My best results always came from going direct additive synthesis for all mip levels, since it let me push everything right up against the Nyquist limit without difficulty. The timbre of e.g. a low frequency saw does bulk up quite a bit if it's given more size, but detectable aliasing does a lot more to hurt musicality.
Transitioning only at zero crossings gave very good results, so I never explored alternatives. At the time I first did this I had a very complicated, error-prone algorithm to efficiently loop between tables of different sizes during rendering but I later realized that this could be done simply by selecting two mipmaps at a time - "current" and "next" - and pasting them into a buffer of larger size. During rendering, it scans over part of that buffer without using all of it - i.e. it just overruns one mipmap and bleeds into the next without doing any kind of special zero crossing detection. Now you only have to know "approximately" when the next frame starts instead of getting a precise solution - and when it does, the offset is moved down and "next" becomes "current". This strategy affords some flexibility for pitch modulation and makes it straightforward to implement wavescanning, double-waveform pairs (ala Casio CZ) or even switch over to PCM data since you can paste in any combinations you want - it's just a matter of matching sample rates within the pair buffer, which in turn favors mips of fixed size.
It mattered more how I went about resampling my mipmaps than the quantity of them. I experimented with various strategies of size/range/density, but a lot of them did not sound different when tested against a sine LFO modulating frequency rise/fall. Every semitone interval is more than enough; every octave isn't quite enough.

All of these ideas have the caveat of audio-rate modulations probably failing - that's something I've never designed for.

Urs · Post by **Urs** » Tue Sep 11, 2018 11:56 am

Triplefox wrote:Transitioning only at zero crossings gave very good results

What if the two wavetables don't share a zero crossing? Did you try switching them when the two wavetables cross or some such thing?

I guess, ideally, one would diff the two tables, find the zero crossings and pick the one with the lowest absolute difference in the surrounding values...?

Wavetable oscillator implementation