KVR Audio

Compyfox · Post by **Compyfox** » Wed May 16, 2012 4:29 pm

Here's a theoretical question for you people.

The idea derived from these three threads:
The Upsampling Your Mix Thread (2007, revived 2009)
Is there a point to export to 24bit 96khz before mastering? (2010)
Steinberg's NotSoHot SRC... (2012)

In these threads, the general consensus is:
Does Upsampling for further editing make sense?

A little bit of background before I go fully theoretical:
In my case, I'm mostly working mostly 48/24 in Cubase 6. Before I master in Wavelab 7, I up the bitrate with an external SRC (sample rate converter) and bitrate converter up 48/32 or even "downsample" so 44/32.

Before you scream "do not do that", I had longer discussions with several engineers on that behalf. Dithering is (to my understanding) in basic form just the bitrate conversion. The "noise shaper" modes pretty much moves the "dither noise" to different frequency sections. The Samplerate Conversion is still a different step, which is the reason why there are "comparision plots" for SRC's existing.

But back on topic - I'm using RME ADC/DAC's, at the moment also the Behringer UltraMatch Pro SRC2496. Pretty much all of them work at 24bit input/output maximum. If we talk about higher bitrates, it's purely ITB since the ADC/DAC can only interpret up to 24bit - the output will get truncated either way.

My theoretical thinking:
Let's say I produce a track in 2.0 (Stereo), 48/24 in Cubase 6. Since I'm happy with it, I give it away for mastering. Or I get a client project at 44/24 and I apply the mastering.

Now, chances are that I also want to offer it at a higher format for possible (and recently upcoming) HD Audio releases. The thing I'd do is to render it into 32bit first (from Cubase) or leave it as is and let the "upsampling" being handled by an external SRC (with client projects).

Here are the facts pre-mastering:
1) Pure SRC and Bitrate conversion does not add any additional content to the source material

But what if I start to master in, let's say 96/32 now. Doesn't matter which host since it's irrelevant at this point. From the last paragraph, we know (and I'm sure we can all agree on that) that there is no content added in this particular process alone.

Since I will probably apply EQ, slight compression and maybe even some saturation as well, wouldn't I add "content"?

Theoretical thinking (again):
1) blank upsampled material has no additional content ranging from the source ampling rate and up - this should show up on FFT's
2) applying plain EQ doesn't add additional (harmonic) content, since it's not a form of distortion (filters only)
3) compressors or saturators (e.g. Tape Machine, Summing Modules) would add harmonic content, since they are a form of distortion (insert effect)
4) after mastering, downsampling and dithering should therefore have more "information" cramped into one file than it had on source.

The reason why I'm thinking this:
I've seen many videos, pictures and even such rigs myself in mastering environments with outboard gear especially. These studios usually have an external upsampling matrix. Some even go more nuts than 8x (think more like 16x and higher). So a 48kHz file that is 32x upsampled would mean that you work with a 1536MHz source file, which is then ran through outboard equipment, and then downsampled again to 96kHz for further editing (format finalisation for example: BD audio for movies/gaming, 48kHz for DVD, 44/24 for FLAC).

A proof is the picture I posted here, which is from a German mastering studio. They presented their environment as part of a SAE Video Tutorial as part of an audio engineering magazine and mentioned that upsampling matrix in the process (you don't see it on that picture). Though I might exaggerate on the upsampling factor.

Anyway... if my theoretical thinking is right, I understand why there are so many "HD audio" releases as of late. Consumers usually do not work at such high sampling rates, 96kHz+ reduces the available input/output channels of ADC/DAC (if I'd do that with my RME, I'd loose out on channels per ADAT pipe) and only large scale studios might pull something like that off (even here, I do not think that a lot of people do this).

What's your opinion on this?
Is my theoretical thinking correct or am I on the wrong track?

Please discuss.

timobrien · Post by **timobrien** » Wed May 16, 2012 7:34 pm

Adding zeros onto the front of a number doesn't make it better, just larger files.
Most programs these days work at 64bit internally anyway.

Dont bother.

jupiter8 · Post by **jupiter8** » Wed May 16, 2012 8:37 pm

Some comments:

Compyfox wrote:Dithering is (to my understanding) in basic form just the bitrate conversion.

My pet peeve. Dither is not bitdepth reduction. On an integer sound file you simply remove the lowest bits and that's it. Mission complete.
However there will be correlated distortion which the ear perceives as unpleasant. So if you add a bit of noise the distortion will become uncorrelated and perceived as noise and thus more agreeable to the ear. So you can choose to dither when you do bit depth reduction but it's not the same thing. Now how much of a difference dither makes is another discussion but it doesn't hurt. Floating point files works on a similar principle (IE remove the least significant bits) but it isn't as easy as with integer files. As i understand there is no meaningful way of dithering floating point files since the noise will be correlated to the signal due to the way floating point works.

Converting to 32 bit float is pretty much pointless but doesn't hurt either. It's going to be converted to floating point anyways in the mastering program (DAW or whatever). Do whatever fits your workflow.

Compyfox wrote:The "noise shaper" modes pretty much moves the "dither noise" to different frequency sections.

That is correct as i understand it.

Compyfox wrote:The Samplerate Conversion is still a different step, which is the reason why there are "comparision plots" for SRC's existing.

Sample Rate Conversion is a complete different process compared to bit depth conversion. One has nothing to do with the other. Sample rate conversion is pretty much calculating the values between the sample points. A common misconception is that we know nothing about this,this is not true. Just "drawing" a straight line between the sample points is a somewhat accurate picture of what's going on in there. This is called linear interpolation. But that's not good enough. You can use higher order interpolation (kinda like spleens in a graphics program). However we know the signal is bandlimited (the sampling theorem states that it must be) so we can actually tell exactly the amplitude at any given point in time. We can accomplish this by bandlimited interpolation,this is pretty hairy stuff so i won't go into that.

Compyfox wrote:1) blank upsampled material has no additional content ranging from the source ampling rate and up - this should show up on FFT's

True. To nitpick it doesn't have any content above the Nyquist frequency of the source material.Or at least it shouldn't have. A "bad" (fast) interpolator (like linear) will distort and generate frequencies but with a "good" (slow) it shouldn't have.

Compyfox wrote:2) applying plain EQ doesn't add additional (harmonic) content, since it's not a form of distortion (filters only)

It could have. I would expect analog modeled ones to generate additional frequencies. Another thing to have in mind is that some EQs behave "badly" close to Nyquist and running them at a higher sample rate will make them behave more like one would expect.

Compyfox wrote:3) compressors or saturators (e.g. Tape Machine, Summing Modules) would add harmonic content, since they are a form of distortion (insert effect)

True.

Compyfox wrote:4) after mastering, downsampling and dithering should therefore have more "information" cramped into one file than it had on source.

Don't know about more but different "information".

What you gain by working at higher sample rates is less aliasing in the plugins (and possibly EQs working better). To do this you must change the sample rate of the file no matter what. You can tell Cubase to render a 96 kHz file or you can convert it externally later.

Blah blah blah something just struck me,if you export from Cubase from the beginning Cubase will work at 96 kHz which would be a better idea than converting it later. You need to double check that though because i believe in the past it didn't do as one would expect. It would render it at 48 and then upsample to 96,this is not what we want. We want Cubase to render the whole shebang at 96 kHz. Don't know if that's still the case or not.

Compyfox wrote: Does Upsampling for further editing make sense?

Yes absolutely.

Damn was i in a writing mode this evening ? Probably forgot half the stuff i was going to mention but there you have it.

Compyfox · Post by **Compyfox** » Thu May 17, 2012 10:05 am

timobrien wrote:Adding zeros onto the front of a number doesn't make it better, just larger files.
Most programs these days work at 64bit internally anyway.

Dont bother.

Actually I do bother, since I'm not talking about "plain" upsampling and doing nothing in the process. I'm talking about upsampling -> mastering -> downsampling to either format (CD/FLAC/MP3, DVD-A, etc).

Thanks for contributing so far, jupiter8:

jupiter8 wrote: Converting to 32 bit float is pretty much pointless but doesn't hurt either. It's going to be converted to floating point anyways in the mastering program (DAW or whatever). Do whatever fits your workflow.

Thanks for the info regarding dithering, so I wasn't completely off and understood the concept.

Regarding 32bit float conversion in a DAW:
I found out that while working in 32bit, I can (in worse cases) overdo things while not having any problems with clipping, and that to an infinite level. Not so with plain 24 bit files - everything above 0dB is simply cut away.

If I look at it from the point of the ADC/DAC, it wouldn't make sense. But editing wise it's a savety rope. At least IMHO.

jupiter8 wrote:Sample Rate Conversion ... this is pretty hairy stuff so i won't go into that.

I think it wouldn't contribute to this discussion, but it would be an interesting thing for a seperate thread maybe. Thanks for anyway for contributing.

jupiter8 wrote:
Compyfox wrote:2) applying plain EQ doesn't add additional (harmonic) content, since it's not a form of distortion (filters only)

It could have. I would expect analog modeled ones to generate additional frequencies. Another thing to have in mind is that some EQs behave "badly" close to Nyquist and running them at a higher sample rate will make them behave more like one would expect.

I meant standard (minimum phase/linear phase) and non-modeling EQs in this case.

jupiter8 wrote: What you gain by working at higher sample rates is less aliasing in the plugins (and possibly EQs working better). To do this you must change the sample rate of the file no matter what. You can tell Cubase to render a 96 kHz file or you can convert it externally later.

Positive sideeffects set aside, but would I "add" additional content and fill everything above the nyquist frequency from the 48khz source while mastering, which would in turn adjust a "HD release"?

jupiter8 wrote: Blah blah blah something just struck me,if you export from Cubase from the beginning Cubase will work at 96 kHz which would be a better idea than converting it later. You need to double check that though because i believe in the past it didn't do as one would expect. It would render it at 48 and then upsample to 96,this is not what we want. We want Cubase to render the whole shebang at 96 kHz. Don't know if that's still the case or not.

It's worth a test though.

Though here the question is: if I render in 96kHz right out of Cubase (or whatever host I have at my disposal), will I have additional content above the nyquist frequency of the initial 48kHz or not?

jupiter8 wrote:
Compyfox wrote: Does Upsampling for further editing make sense?
Yes absolutely.

Still leaves the question open with the content. But it makes sense with the oversampling matrix(es) in studios with external equipment, so why should it not make any sense in software form?

jupiter8 wrote: Damn was i in a writing mode this evening ? Probably forgot half the stuff i was going to mention but there you have it.

It's a good start and I'm sure this will turn into something more.
Thanks for contributing.

camsr · Post by **camsr** » Fri May 18, 2012 5:31 am

Upsampling is beneficial for many processes because the resolution while creating the information is greater. As for listening to the same creation at it's native 96khz and downsampled 48khz, I don't think anyone has the ears to make the distinction between 96khz and 48khz, given a perfectly empirical test, with no THD on the listening device.

This is pretty obvious if you consider Frequency Modulation of a synth. Oversampling gets rid of more aliasing, pure and simple. But it is listened to at the original sample rate, which would have aliased audibly.

Also, if you have ever delayed a track by one sample and then mixed it equal into the same track undelayed, you will notice the comb filtering has formed a kind of lowpass filter perceptually. If the sample rate were higher, there would be less of a filtering effect for the one sample delay, and therefore better resolution in the time domain. This also means that for every sample it is shifted in time, the frequency accuracy of the comb filter is increased, as a constructive or destructive interference can be placed at a frequency that could not be addressed with a lower sample rate. I suggest everyone try this experiment once to see just how one sample of delay can affect the treble frequencies. At 48khz, the resolution is pitiful. Some things need better time resolution, some don't, and some just use phase shifting techniques. In any case, the process will necessitate the required performance and sample rate is one way of making things more accurate.

Compyfox · Post by **Compyfox** » Fri May 18, 2012 8:44 am

The question however still is what happens with a full bandwidth audio file that is upsampled, then edited and (if needed) downsampled again. Not the usage of synths at such a high sampling rate.

Most important is still: Will there be content added or not?
How can someone test that with simple means (audio file)?

I don't think that tests with sinewaves (just the fundamental unless you run it through a saturator) or noise would make any sense. IMO you'd also need to see what's happening above the 20kHz frequency ranges and not many FFTs do that.

camsr · Post by **camsr** » Fri May 18, 2012 8:59 am

Compyfox wrote:The question however still is what happens with a full bandwidth audio file that is upsampled, then edited and (if needed) downsampled again. Not the usage of synths at such a high sampling rate.

Most important is still: Will there be content added or not?
How can someone test that with simple means (audio file)?

I don't think that tests with sinewaves (just the fundamental unless you run it through a saturator) or noise would make any sense. IMO you'd also need to see what's happening above the 20kHz frequency ranges and not many FFTs do that.

The interpolation of the resampling will add noise. I can't find the paper I read it in, but I remember a value of something like -120dBFS or more for Sinc interpolation. I believe Hermite was about -103, but don't quote me on that. These levels are very low, and the process could be repeated many times without a discernable influence on the sound. I think the noise is like a THD in an amp, because the interpolation slightly shapes the waveform based on what method is used.

The antialias filter is the most critical part of resampling. The good stuff uses FIR and linear phase, but I think there is other options too. If linear phase is not used, then phase shift will result in the frequencies around nyquist, usually resulting in something different.

Compyfox · Post by **Compyfox** » Fri May 18, 2012 1:54 pm

Well, this clears the thing about the upsampling "itself", depending on the used tool (hence the SRC comparisions as linked in the first post). But it still leaves the question with the "added content on further editing".

I think the only test I can do at this point is taking a 48/24 recording, upsample to 96/24, apply either EQ (modeling), a compressor or plain saturator (BootEQmkII comes to mind with the gain compensated input saturation stage), then compare the 96/24 files (both raw and edited) with a null test.

To my knowledge this would show me, in form of artifacts, what content is added and in which frequency ranges, no?

Remember, I'm still loking for the "HD Audio" meme in the box. I understand what's happening in terms of external upsampling matrixes at a mastering studio - here it makes sense to give the ouboard gear a feed that has the highest sampling rate possible (analog signals are somewhat infinite in terms of "sampling rate", it's volts and not bits/bytes).

From the last posts, we know it makes sense in the pure digital realm as well. But what I'm trying to aim at with the thread is: "what is happening?", "what 'mojo' can we expect?".

HD audio formats are (in my opinion) the future, even though the playback engines are limited for the time being. Why not dive deeper into that area so we can all benefit from it?

Another question:
A lot of tools offer built in oversampling. Sometimes 4x, nowadays even 8x, rarely even 16x. Now, if we're at 96kHz already, would the oversampling matrix still be applied if activated or does it ignore the OS process since we already are at 96kHz?

Please keep on discussing.

jupiter8 · Post by **jupiter8** » Fri May 18, 2012 8:49 pm

camsr wrote: Also, if you have ever delayed a track by one sample and then mixed it equal into the same track undelayed, you will notice the comb filtering has formed a kind of lowpass filter perceptually. If the sample rate were higher, there would be less of a filtering effect for the one sample delay, and therefore better resolution in the time domain. This also means that for every sample it is shifted in time, the frequency accuracy of the comb filter is increased, as a constructive or destructive interference can be placed at a frequency that could not be addressed with a lower sample rate. I suggest everyone try this experiment once to see just how one sample of delay can affect the treble frequencies. At 48khz, the resolution is pitiful. Some things need better time resolution, some don't, and some just use phase shifting techniques. In any case, the process will necessitate the required performance and sample rate is one way of making things more accurate.

One should not draw conclusions based on an incomplete set of data. There are such things as subsample delays.

camsr wrote: The interpolation of the resampling will add noise. I can't find the paper I read it in, but I remember a value of something like -120dBFS or more for Sinc interpolation.

It can be much lower than that.

Compyfox · Post by **Compyfox** » Sat May 19, 2012 3:32 pm

I know it's a weekend, the last days of NAMM Russia and almost "holiday weeks" in Germany again, but...

Is the discussion over already?

Doug1978 · Post by **Doug1978** » Sat May 19, 2012 3:41 pm

^^ Compyfox, you know more about me on this subject so I can't really give you too much help.
However, a man who can is Bob Katz, particularly in his book 'Mastering Audio: The Art and the Science'.
http://www.amazon.co.uk/Mastering-Audio ... 984&sr=8-1

It's been about 18 months sine I bought and read that book - I think the answers to some / most of your queries are in there. i would check myself but the book is in England and I'm currently in Japan.

If you haven't read it already, I think that it's on of the best books on this sort of area.

Cheers.
Doug

Compyfox · Post by **Compyfox** » Sat May 19, 2012 5:12 pm

I might dive into that one again, but from browsing through (the last time) I didn't see anything about the method I'm trying to discuss here.

Maybe KVR is the wrong board for this as well? Who knows.
Thanks anyway.

Upsampling from 48/24 to 96/32 -> Mastering -> Downsample to either desired format?