KVR Audio

Fire Sledge - Ohm Force · Thu Sep 23, 2004 2:21 pm

René wrote:sfz does only precalculate impulses for two octaves, which is what the SoundFont spec asks for. Beyond that, it keeps using the last impulse.

OK, I wasn't aware of that.

René · Post by **René** » Thu Sep 23, 2004 2:32 pm

But now I'm also curious about how the sinc would perform to decontamine naive synth waveforms as well

Damn if I'd only have a few zillion less things to do...

-René

tony tony chopper · Post by **tony tony chopper** » Thu Sep 23, 2004 3:58 pm

I'm afraid that your test is showing the aliasing existing in your sample

how could a sample have.. any kind of aliasing in itself?

sfz does only precalculate impulses for two octaves, which is what the SoundFont spec asks for.

so you do agree that the oversampling method would imply at most a 4x oversampling, which is not *that much* a waste of CPU. A well optimized usual (4 or 6 point) interpolation could be 4x faster than a sinc, why not. In which case the 4x oversampling+usual interpolation shouldn't be neglected.

René · Post by **René** » Thu Sep 23, 2004 4:52 pm

how could a sample have.. any kind of aliasing in itself?

Just by not having integer period (which turned out not to be Fire's waveform case after all). Check my above gif to see how a 15kHz naive waveform spectrogram looks. The point is that the sample was -generated-, not -sampled-. So the result has tones below the fundamental.

so you do agree that the oversampling method would imply at most a 4x oversampling, which is not *that much* a waste of CPU. A well optimized usual (4 or 6 point) interpolation could be 4x faster than a sinc, why not. In which case the 4x oversampling+usual interpolation shouldn't be neglected.

I'm not 'neglecting' oversampling.

Perhaps you missed the part where I mentioned that I -do use- it: I have two instruments using oversampling (P1 and z3ta+), and I've done other tube-amp simulator projects using oversampling.
Actually, a version of the sfz engine with many extra resampling methods is in the work, including oversampling.

I'll say this one more time: there's no 'perfect solution'. There's just 'perfect solution for me', so -nothing- is neglected here.

The -only- thing I'm trying to diss is -non oversampled linear interpolation- sample playback devices, which is what most of the players in the market (and in the graphics which started this discussion) are using.

Starting with non-oversampled Hermite and above, I'm happy.

-René

Fire Sledge - Ohm Force · Thu Sep 23, 2004 7:50 pm

gol wrote:so you do agree that the oversampling method would imply at most a 4x oversampling, which is not *that much* a waste of CPU. A well optimized usual (4 or 6 point) interpolation could be 4x faster than a sinc, why not. In which case the 4x oversampling+usual interpolation shouldn't be neglected.

The question is: what SNR level do you expect on what resampling ratio range with a given oversampling and interpolator. Then we can count the cycles (or just the mul and add) and compare with other methods targetted to the same SNR.

-- Laurent

René · Post by **René** » Thu Sep 23, 2004 9:23 pm

The question is: what SNR level do you expect on what resampling ratio range with a given oversampling and interpolator. Then we can count the cycles (or just the mul and add) and compare with other methods targetted to the same SNR.

...and the frequency response, latency/delay and memory usage and I think we're done

-René

René · Post by **René** » Fri Sep 24, 2004 2:08 am

One extra thing it's worthy to mention is that implementation-specific limitations can make a possible resampling method not possible.

sfz/sfz+ were designed with the ability to play huge sample collections in mind. As sample collections are tenths or hundreds of times bigger than available memory in nowadays computers, disk streaming seems mandatory for many sampling applications.

In that scenario, the resampler had to be designed to process streaming data, as both sfz/sfz+ do.

As streaming can be imagined as 'constant sample loading', there's no chance of using any relatively-expensive (cpu) sample pre-processing (like pre-oversampling or mip-mapping).

Every clock cycle used in the whole resampling process must be taken into account, as there's no way to 'precalculate' stuff, apart from impulse tables or suchlike data. Fortunately, the resampling range for a sample-playback device aimed to sample collections is relatively small when we compare them against wavetable or table-playback synthesizers.

At the same time, many dsp processes have place after resampling. Consequently, it's not practical to upsample and use a single downsample operation after voice mixing. This would lead to having all dsp at a higher samplerate, while many processes wouldn't get any benefit of it.

The alternative would be oversampling, resampling and then downsampling in a per-voice basis. I honestly didn't try it, as I imagine it largely more expensive in cpu clocks.

So, requirements can help to choose the resampler by considerably limiting available options

-René

Muon Software Ltd · Post by **Muon Software Ltd** » Fri Sep 24, 2004 11:13 am

The -only- thing I'm trying to diss is -non oversampled linear interpolation- sample playback devices, which is what most of the players in the market (and in the graphics which started this discussion) are using.

Interesting. Now I wouldn't want to defend linear interpolation purely on sound quality grounds, that would be silly. There are many measureably better ways to resample, after all.

However, to my own ears if I take a converted Akai patch (such as the 72mb version of the Splendid Piano) and play the same MIDI file in SFZ and in Tachyon I really can't say if one sounds better than the other. In SFZ's draft mode, it is difficult to say which one is using more CPU also. I'd probably say SFZ is fractionally quicker but it's open for discussion.

Now if I bump SFZ up to the middle of the quality range (24), CPU consumption doubles. But does it sound twice as good? Blindfold, I'd be unable to tell the difference. By the time I get to max quality I'm soaking my CPU completely (Athlon 1700+, 512mb RAM, nForce) and I couldn't say what I've gained sound quality-wise.

I've got no wish to diss Rene's technical achievement in his resampling algorithm, on the contrary I think it is wonderful. But the audible benefits of the extra CPU consumption are very difficult to put a finger on. Maybe I should implement a higher-quality interpolation mode in Tachyon just to get bragging rights

I will render out some samples for others to comment maybe.

Cheers
Dave

mauseoleum · Post by **mauseoleum** » Fri Sep 24, 2004 11:23 am

what about the ds404 ?

Muon Software Ltd · Post by **Muon Software Ltd** » Fri Sep 24, 2004 11:27 am

DS404 uses linear interpolation, and so will perform "poorly" in the comparison

stefancrs · Post by **stefancrs** » Fri Sep 24, 2004 11:32 am

What about "converting" the sample to contain the integrated value (per sample) instead of the current value (per sample)? Then one could calculate those integrated values with high precision and then downsample from "infinite" oversampling using a box filter (since you'd just take the average of the output since the last output sample). Ofcourse reading from the sample would use interpolation of some sort, and with some oversampling as well one could probably reduce the aliasing that occurs due to using box filtering...

Muon Software Ltd · Post by **Muon Software Ltd** » Fri Sep 24, 2004 12:12 pm

OK - here's the examples. It's a whopping 10.5mb, so please everyone don't all download at once. The file will only be available for today only to conserve bandwidth probably.

Notice I've left it open as to which file is which

There are some slight differences in the dynamics (probably just envelope/velocity settings in the converted patches) but on the whole, they sound as close to identical as makes no difference to me.

One file is Tachyon using linear interpolation with a peak CPU consumption of >30%, the other is SFZ in maximum quality mode (72) with a peak CPU consumption of >80%. The actual MIDI file contains a peak polyphony of 64 layers thanks to heavy sustain pedal use.

http://www.muon-software.com/temp/examples.rar

Regards
Dave

stefancrs · Post by **stefancrs** » Fri Sep 24, 2004 12:19 pm

I think file1 sounds slightly better, but that might just be imaginary. One question though: is there any resampling going on here or does the used sample library have a sample per semi-note?

René · Post by **René** » Fri Sep 24, 2004 12:21 pm

It all depends on the contents, how much you transpose them and how well you hear. In a multisample, we don't know how much we're transposing. There's chances that some notes wouldn't require any interpolation, as they play the root note.

Previously in this thread, I posted three examples of Linear Interpolation versus Hermite, a cheap non-bandlimited polynomial interpolator for one, five and twelve semitones. The one-semitone interval might be a little hard to spot, the other two are awfully evident.

-René

René · Post by **René** » Fri Sep 24, 2004 12:28 pm

This is the post with the listening test I mentioned, in case anyone would like to take the challenge.

------------------
All the audio snippets have a violin major third, transposed in different intervals. They all compare Linear interpolation with Bicubic interpolation. All tests were rendered in sfz.

The first test is transposing the sample down two octaves. I agree that this is quite out of 'standard' or 'used' transposition ranges, but it helps a lot to first hear what we're looking for very clear to then move to more subtle examples.

http://www.rgcaudio.com/downloads/other ... oct-dn.rar

Ok. I believe everyone here will be able to percieve the noise we've been discussing about. If you can't, please ignore the rest of this post. Perhaps it's time to get a new soundcard or speakers

If you can percieve it but you think the linear wave sounds better, that's cool as well. The important thing is that you hear the difference.

The second snippet shows the same scenario, but now the transposition level is 5 semitones down. This is a very commonly used interval. For SoundFonts and Akais, for instance, one-octave transposition values are pretty common. This is about a half.

http://www.rgcaudio.com/downloads/others/vln-5st-dn.rar

Ok. If a difference between both samples is not really noticeable for you, there's nothing wrong. I performed the test to 5 non-musically/technically trained persons here, and only two were capable of hearing it. But if it -is- noticeable, let's take the next challenge.

http://www.rgcaudio.com/downloads/others/vln-2st-up.rar

Now the transposition is one whole tone up. I won't comment on this one, I'd like to know what you hear.

No tricks, no hidden stuff, nothing 'fancy'. Just trying to translate from graphics and numbers to sounds and perceptions.

-----------------

-René

The aliasing thread