KVR Audio

declassified · Post by **declassified** » Fri Jun 29, 2018 10:45 am

With convolution, the wet/output audio can be louder than the dry input. For example, if the IR is a "dense" reverb with lots of early reflections.

What would you guys use to compensate for this? I want to find a gain factor for a given IR, apart from determining it manually by ear for each IR. It doesn't have to be 100% accurate, just a good direction. As the output volume depends on both the IR and the input signal anyway. Logic's Space Designer has a switch that does that: "rev vol compensation".

Any ideas are much appreciated

Aleksey Vaneev · Post by **Aleksey Vaneev** » Fri Jun 29, 2018 10:55 am

declassified wrote:With convolution, the wet/output audio can be louder than the dry input. For example, if the IR is a "dense" reverb with lots of early reflections.

What would you guys use to compensate for this? I want to find a gain factor for a given IR, apart from determining it manually by ear for each IR. It doesn't have to be 100% accurate, just a good direction. As the output volume depends on both the IR and the input signal anyway. Logic's Space Designer has a switch that does that: "rev vol compensation".

Any ideas are much appreciated

It's a bit complex affair. I do it this way: divide IR into blocks (e.g. standard 4096 for FFT) create a noise buffer of this size, low-pass filtered at 1000 Hz, then convolve noise buffer with each block of IR, calculate RMS of each resulting convolved block, calculate difference of RMS of convolved block and noise block, sum squares of the differences. Pretty CPU-consuming, but results are great and account for IR length, density, etc variations.

Maybe it's possible to do without convolution, simply by low-passing the the IR blocks, I'll try some day.

Andrew Souter · Post by **Andrew Souter** » Fri Jun 29, 2018 3:58 pm

It depends, as you mention, on the spectral content of the IR and of the source sound it is being convolved with. If your impulse is a sine wave at 15000hz for 1sec, and your input signal a sine wave at 20hz, the result will be quite low in volume. If however, the input signal is a sine wave at 15000hz, the result will be quite loud. It depends on the spectral overlap of the two signals.

Since you don't know what the spectrum of source/input signal is, you can't get a perfect answer for all cases. Therefore, you must make some "reasonable assumptions". The most simple case, is that both spectrums are white/flat. In this case the answer is actually simple:

rmsComp = 1.0 / sumOfSquares(all sample values of your your IR)

sumOfSquares = 0.0;

for (n = 0; n < N; n++)
{
sumOfSquares += IR[n] * IR[n];
}

rmsComp = 1.0 / sqrt(sumOfSquares);

i.e. the sum of squares for the corrected impulse should be 1.0 . in some cases, this is "good enough".

an even more reasonable assumption is to assume a "pink" (1/f) spectrum, (or other fancy spectral weighting) but that gets a lot more complicated.

Aleksey Vaneev · Post by **Aleksey Vaneev** » Fri Jun 29, 2018 4:12 pm

Andrew Souter wrote:rmsComp = 1.0 / sumOfSquares(all sample values of your your IR)

That will unfortunately work only with dense IRs, if you apply this to delay-like sparse IRs, estimation will be very off. The approach I've outlined above works for almost all IRs.

Andrew Souter · Post by **Andrew Souter** » Fri Jun 29, 2018 4:50 pm

well, actually I made a typo/omission in the 'english" wording explanation:

rmsComp = 1.0 / sumOfSquares(all sample values of your your IR)

I meant:

rmsComp = 1.0 / sqrt(sumOfSquares(all sample values of your your IR))

but the code example was correct...

are you sure it fails the sparse case, even when applied to white noise? I don't think so.

it may fail in the sparse case when applied to real music signals simply bc the sparse case will have a very spiky comb-filter like mag response, and the music signal may also be spiky, so the "spectral overlap" interactions (constructive and/or destructive) may be extreme.

So applied to specific sustained musical tones, ya, the error can be large. Applied to something more like noise it should be less of an issue. And (some flavor of) noise is likely a reasonable assumption since you don't know what input signal the user is using.

more correct is do it in the freq domain with a "better" spectrum assumption than white, but with spiky mag responses (such as sparse time domain or highly resonant filters) it gets very complicated to get a good result. We struggled with such topics in Kaleidoscope for example... (not convolution, but same basic family of challenges)

declassified · Post by **declassified** » Fri Jun 29, 2018 5:42 pm

Thank you guys! I appreciate it.

In my case, the input could be notes played by a synth, so usually it will be musical tones at some pitch I don't know until they are played.

When I implement it, I'll probably start with the sum-of-squares approach and test it.

Music Engineer · Post by **Music Engineer** » Fri Jun 29, 2018 8:54 pm

Andrew Souter wrote: The most simple case, is that both spectrums are white/flat. In this case the answer is actually simple: the sum of squares for the corrected impulse should be 1.0 . in some cases, this is "good enough". an even more reasonable assumption is to assume a "pink" (1/f) spectrum, (or other fancy spectral weighting) but that gets a lot more complicated.

i guess, in this case, i could just take one big FFT of the whole impulse-response, apply my spectral weighting function and take the sum-of-squares of the weighted spectrum instead, right? it should reduce to the sum-of-squares of the impulse response when the spectral weighting is uniform by parseval's theorem, if i'm not mistaken.

edit: hmmm - depending on the normalization convention used, i may have to divide by the impulse response length N:
https://en.wikipedia.org/wiki/Parseval% ... in_physics
but should this be the actual impulse response length or the FFT buffer size (assuming to use zero-padding)? ...i guess, the latter - yeah, that seems to make more sense

Andrew Souter · Post by **Andrew Souter** » Sat Jun 30, 2018 12:52 pm

Music Engineer wrote:
Andrew Souter wrote: The most simple case, is that both spectrums are white/flat. In this case the answer is actually simple: the sum of squares for the corrected impulse should be 1.0 . in some cases, this is "good enough". an even more reasonable assumption is to assume a "pink" (1/f) spectrum, (or other fancy spectral weighting) but that gets a lot more complicated.
i guess, in this case, i could just take one big FFT of the whole impulse-response, apply my spectral weighting function and take the sum-of-squares of the weighted spectrum instead, right? it should reduce to the sum-of-squares of the impulse response when the spectral weighting is uniform by parseval's theorem, if i'm not mistaken.

edit: hmmm - depending on the normalization convention used, i may have to divide by the impulse response length N:
https://en.wikipedia.org/wiki/Parseval% ... in_physics
but should this be the actual impulse response length or the FFT buffer size (assuming to use zero-padding)? ...i guess, the latter - yeah, that seems to make more sense

yes, you can do the integration/sum in the freq domain instead of the time domain, and indeed you have to (afaik -- but maybe not... hmmm. i should try...) if you want to use any kind of spectral weighting function.

in this case instead of time domain for white such as:

sumOfSquares = 0.0;

for (n = 0; n < N; n++)
{
sumOfSquares += IR[n] * IR[n];
}

rmsComp = 1.0 / sqrt(sumOfSquares);

you have:

sumOfSquaresWeghtingFunc = 0.0;
sumOfSquaresFilteredSignal= 0.0;

for (n = 0; n < N; n++)
{
sumOfSquaresWeghtingFunc += WF[n] * WF[n] ;
sumOfSquaresFilteredSignal += H[n] * H[n] * WF[n] * WF[n] ;
}

rmsComp = sqrt(sumOfSquaresWeghtingFunc) / sqrt(sumOfSquaresFilteredSignal)
= sqrt(sumOfSquaresWeghtingFunc / sumOfSquaresFilteredSignal)

if i remember correctly without checking...

H[n] could be FFT bins, or just samplings of filter mag response etc.

mystran · Post by **mystran** » Sat Jun 30, 2018 3:35 pm

Andrew Souter wrote: yes, you can do the integration/sum in the freq domain instead of the time domain, and indeed you have to (afaik -- but maybe not... hmmm. i should try...) if you want to use any kind of spectral weighting function.

You can alternatively apply the weighting by filtering in the time-domain. You can even use IIR filters for this since phase-shift only matters if you're windowing.

That said... while normalising for total RMS works pretty well for short IRs like cabinet responses or whatever, with longer reverbs this is likely to make them pretty quiet.

Instead you might want to try calculating something like the maximum peak of a running RMS with something like 200ms (or maybe slightly longer, but still "momentary" in some sense) exponential moving average. That should give you a better estimate of how loud the loudest parts of the IR will be in response to transient sounds (eg. "short" notes or whatever), which should make for a better estimate of how loud the reverb "feels" in practice.

This will then certainly make longer reverbs louder in response to stationary sounds, but (1) most music isn't that stationary and (2) it's typically easy to adjust the reverb down if the maximum level is somewhere in the general ballpark of "maximum you'd ever want to use", where as having to reserve a lot of gain-control range for boosting is quite inconvenient.

It is sort of impossible to get a perfectly good estimate for every possible IR, but I'd still keep in mind that in most cases the source material is probably not going to be 5 minutes of static pink noise.

Andrew Souter · Post by **Andrew Souter** » Sun Jul 01, 2018 1:47 am

mystran wrote: You can alternatively apply the weighting by filtering in the time-domain. You can even use IIR filters for this since phase-shift only matters if you're windowing.

Ya, I sort of suspected that as I was typing it above. But I've never actually tried it this way to verify.

mystran wrote: but I'd still keep in mind that in most cases the source material is probably not going to be 5 minutes of static pink noise.

of course, but actually it might indeed fairly well represent the average spectrum over all possible musical (or general sound) that the given product/process might encounter.

mystran · Post by **mystran** » Sun Jul 01, 2018 3:23 am

I just checked what I did for my old IRDust and that one actually takes an FFT of the whole IR and finds the maximum peak in the spectrum (ie. the loudest frequency) and uses that for normalisation directly. I guess that works particularly well for filters (eg. narrow-band filters don't get ridiculously loud in comparison to wide-band IR; the response peak is always at a predictable level), but it does tend to make long reverbs sound somewhat more quiet than short ones.

Convolution: Best practice to compensate IR volume