Any shortcuts to "True Peak" detection?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

It seems like there are much variation in results if you trust what this ISP/True Peak limiter test shows.

I've used mainly Orban's Loudness meter which by documentation :
"provides two true peak-reading meters. The red bar appearing in the
VU and PPM meters reads the peak values of the internal 48 kHz digital
samples within the meter. By oversampling 8x, the Reconstructed Peak
meter extrapolates the peaks of the signal after D/A conversion, as
specified in the BS.1770 standard."

Post

juha_p wrote: Fri Jun 10, 2022 6:30 am It seems like there are much variation in results if you trust what this ISP/True Peak limiter test shows.

I've used mainly Orban's Loudness meter which by documentation :
"provides two true peak-reading meters. The red bar appearing in the
VU and PPM meters reads the peak values of the internal 48 kHz digital
samples within the meter. By oversampling 8x, the Reconstructed Peak
meter extrapolates the peaks of the signal after D/A conversion, as
specified in the BS.1770 standard."
If you read the actual BS.1770 it talks about true peak readings and maximum overshoots at various sample rates and it's really not too strict about how that's done. The example polyphase FIR is 4x with 12 taps per branch (ie. pretty much in the "low-cost compromise" category) and in the discussion they note that while more oversampling is preferable, the actual filter design need not be as strict as a regular resampling filter ("since we are not going to listen to the output of our over-sampler, but only use it to display a reading or drive a bar graph"), so clearly they don't expect "perfect" results (ie. the vibe I get from the actual license text is that whoever wrote it understood perfectly well that the problem is not something you can solve exactly).

Contrast this with the K-weighting for example where the filters are actually specified (as biquad coefficients at 48kHz (which is kinda annoying way to spec things because "implementations at other sampling rates will require different coefficient values, which should be chosen to provide the same frequency response that the specified filter provides at 48 kHz" .. but in practice I think the filters are basic BLT designs, so if you convert the coeffs to something like ZDF-SVF and adjust the tuning you'll get close enough).

Post

In 1996, there were an article by (database topics editor) Mike J. Courtney on Dr. Dobb's Journal which introduced another method called A Cubic Spline Extrema Algorithm which, IIUC, could suite for audio TP detection too (perhaps for off-line work).

Post

A cubic spline interpolant - in the sense they mean it in that article - is usually a global thing: You need to have all the datapoints at once and each polynomial segment will depend on all points. That means, even your leftmost datapoint will still to some amount affect the rightmost cubic segment (although the amount of influence decays away with distance). It's a tridiagonal linear system. In audio, we usually don't want such a thing and prefer more local interpolants such as, for example, Hermite or Lagrange interpolants. In a less strict usage of terminology, these are also sometimes called "splines" - but the article is apparently talking about the splines in the narrower sense.
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

Music Engineer wrote: Sun Jun 19, 2022 5:28 pm A cubic spline interpolant - in the sense they mean it in that article - is usually a global thing: You need to have all the datapoints at once and each polynomial segment will depend on all points. That means, even your leftmost datapoint will still to some amount affect the rightmost cubic segment (although the amount of influence decays away with distance). It's a tridiagonal linear system. In audio, we usually don't want such a thing and prefer more local interpolants such as, for example, Hermite or Lagrange interpolants. In a less strict terminology, these are also sometimes called "splines" - but the article is apparently talking about the splines in the narrower sense.
The one unique C2 continuous cubic spline (that you obtain by solving a linear system) is often called "the natural cubic spline" to differentiate it from other cubic splines (eg. Hermite, Lagrange) that have less continuity, but as you point out it's not always clear whether "cubic spline" refers to the natural cubic spline or other types of cubic splines. Trying to use the natural cubic splines for this though sounds like waste of time, as lacking compact support (ie. every segment depends on every data point) it might just make your results less predictable rather than more accurate.

I'd honestly just use Catmull-Rom with a bit of oversampling, since it's a known good audio interpolator and like any cubic you can solve for the extrema (roots of the quadratic derivative) directly. The fact that the construction uses derivatives at the end-points should probably also allow you to avoid even constructing (at least fully) the cubic when no extrema are relevant (which especially with oversampling is going to be the grand majority of samples).

Post

Hmm..well - seems like wikipedia also uses a less strict interpretation. I've seen some definitions, where "splines" specifically refer to those piecewise polynomial functions, where every segment depends on all datapoints via this big tridiagonal system involving all the data at once. In what I usually call "Hermite" interpolants, the segment between n and n+1 is only influenced by datapoints n-1,n,n+1,n+2. The difference is that with Hermite interpolants, you actually prescribe target *values* for the 1st derivative at the nodes (which I usually obtain via a central difference - which brings in the dependency on n-1 and n+2). In actual splines, you just demand that the 1st and 2nd derivative should *match* at the nodes without prescribing any values for them. This is what creates this global coupling. In Hermite interpolation, the segments are decoupled: if I wiggle the datapoint at n-7 a little bit, the segment between n and n+1 remains completely unaffected. This is not the case with the splines in the narrower sense. The dependency "ripples through" via this tridiagonal matrix.
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

Music Engineer wrote: Sun Jun 19, 2022 6:02 pmIn Hermite interpolation, the segments are decoupled: if I wiggle the datapoint at n-7 a little bit, the segment between n and n+1 remains completely unaffected. This is not the case with the splines in the narrower sense. The dependency "ripples through" via this tridiagonal matrix.
Right. You can't have compact support, interpolation and C2 continuity with cubics, because you run out of degrees of freedom. It's a pick two situation:

interpolation and C2 -> natural cubic spline
compact support and interpolation -> C1 Hermite splines, etc
compact support and C2 -> cubic B-Splines

I think for uniform knot spacing, you can formulate the problem of finding a new set of points that causes the cubic B-splines to interpolate the original points (ie. act as the natural cubic) as a (bidirectional) IIR filtering problem... but I forgot the details.

Post

Okay - no details needed. Sounds too mathily complicated for me now anyway. I'd rather lean back and continue following and sometimes commenting the clap threads on this sunday evening. :hihi:
My website: rs-met.com, My presences on: YouTube, GitHub, Facebook

Post

I haven't seen any shortcut so far.

Blindness for the "true" continuous signal is always a consequence of aliasing. Like trying to directly measure the PCM, i.e. the non bandlimited, intermediate memorization format. Or running non bandlimited "abs()" "if()" and the likes.

The only reasonable way out is to use traditional antialiasing methods. i.e. simulating the second half of the sampling theorem, the output Nyquist filter. None of these tricks are easy or fun, most ask for compromises or often only fit into special situations.
Fabien from Tokyo Dawn Records

Check out my audio processors over at the Tokyo Dawn Labs!

Post

Ignoring the complexities of detection for a moment, for a limiter specifically, one should be able to mitigate ISPs simply by limiting the bandwidth of the gain-control signal. Basically when you multiply two signals, the resulting bandwidth is the sum of the bandwidth of the two signals.

Theoretically if you have a 44.1kHz signal and you band-limit it to (say) 20kHz as part of the oversampling (to support ISP detection) and then band-limit the gain-control signal to say 2kHz (= 0.5ms wavelength), then the total signal bandwidth will be 20kHz+2kHz=22kHz and there will be no aliasing, which means you don't need to filter for the downsampling (and hence no additional ISPs introduced by the filter.. in theory anyway).

Practically speaking "perfect" band-limiting of the gain control signal might be problematic to do in a way that still guarantees that the signal stays below threshold, but something like sliding max-window followed by a few rounds of box-filter can give you approximately band-limited signal, which can at least mitigate the aliasing significantly. On interesting aspect of this kind of strategy is that if band-limiting of the gain control is archieved by filtering, then from the point of view of ISPs alone, it shouldn't really matter whether the gain computation itself (before the signal is filtered) suffers from aliasing.

The trade-off is that band-limiting of the gain control signal puts a restriction on how fast the limiter can act on isolated peaks, but I feel like 1-2kHz of bandwidth is probably close the lower bound for what still sounds effectively like "smooth clipping" without losing too much impact on transients, so I'd say this kind of strategy might be quite workable. YMMV.

Post

2kHz for the control is only sufficient on paper, sadly. Makes a very smooth compressor, waves ren comp style, but not very good at creatively crunching stuff or overshoot control. Popular analogue compressors, and limiters anyway, attack/release much faster than that.

As for the audio side, bandlimiting to 20kHz is going to reach into the audible bandwidth and become audible. Such a filter would be demanding. Doing the same operation with true resampling would likely end up costing the same, or even less when using halfband kernels (which can't really be used outside true resampling).

(this assumed all filtering is done in the time domain)

If I'd plan to use a strict 2kHz bandlimiting for the control signal, I wouldn't resample the audio path at all.That's because in this case all aliasing will fold back into inaudible regions anyway, effectively doubling the usable spectral headroom to 4kHz or even a bit more.

~3-4kHz of bandlimiting for the control signal, and no audio filtering at all sounds like a good balance to me. Aliasing will be largely inaudible.


There's no easy free lunch to antialiasing. It's always going to be messy and compromised. :)
Fabien from Tokyo Dawn Records

Check out my audio processors over at the Tokyo Dawn Labs!

Post

FabienTDR wrote: Mon Jun 20, 2022 4:01 pm As for the audio side, bandlimiting to 20kHz is going to reach into the audible bandwidth and become audible. Such a filter would be demanding. Doing the same operation with true resampling would likely end up costing the same, or even less when using halfband kernels (which can't really be used outside true resampling).
I was suggesting you do "true resampling" to oversample for the ISP detection purposes, but lower the resampling filter's cutoff slightly to allocate some bandwidth to avoid filtering after limiting. The direct effect on filter length is negligible, though you'll certainly need a fairly long filter to get a steep cutoff. I'm not convinced the CPU use would be excessive though. YMMV.

Also, I'm not a huge fan of L-band filters for audio, because they have crappy attenuation at Nyquist.
If I'd plan to use a strict 2kHz bandlimiting for the control signal, I wouldn't resample the audio path at all.That's because in this case all aliasing will fold back into inaudible regions anyway, effectively doubling the usable spectral headroom to 4kHz or even a bit more.

~3-4kHz of bandlimiting for the control signal, and no audio filtering at all sounds like a good balance to me. Aliasing will be largely inaudible.
Well, it's not like the audible aliasing is necessarily a problem here, rather the problem is that aliasing means you risk introducing additional ISPs... though I don't know how much of a real practical problem it would be if the amount of aliasing is negligible.

Post

Matlab team suggests:
1. The signal is over-sampled to at least 192 kHz.
2. The over-sampled signal, a, passes through a low-pass filter with a half-poly-phase length of 12 and stop-band attenuation of 80 dB.
3. The filtered signal, b, is rectified and converted to the dB TP scale: c=20×log10(∣b∣)
4. The true-peak is determined as the maximum of the converted signal, c.

https://se.mathworks.com/help/audio/ref ... meter.html

Simple Octave implementation could be:

Code: Select all

[X, fs] = audioread("impulse_1.wav")  % https://vladgsound.wordpress.com/2012/07/15/simple-isp-peak-meter-and-few-tests/
% Change the sample rate of x by a factor of p/q.
% This is performed using a poly-phase algorithm.
p = 4;
q = 1;
% resample using library function
% https://octave.sourceforge.io/signal/function/resample.html
Y = resample(X, p, q);

% Find the maximum absolute value and convert to logarithmic scale.
dBTP = 20 * log10( max(max(abs(Y))) );
and it results: dBTP = 2.0940 for the example test file (by sample source, calculated inter-sample peak value is +2.098 dB).

Dunno which kind of poly-phase algorithm Octave resample() uses but, as an alternative, I did implement 2nd version which resampled by adding zeros between original samples and then used FIR coefficients (got by the instruction from linked page) in filter() command.

Results for the example test file (2x-8x resampling, max_tp "my" implementation)):

Code: Select all

>> [dBTP, max_tp] = max_true_peak(2)
dBTP = 2.0940
max_tp = 2.0429
>> [dBTP, max_tp] = max_true_peak(3)
dBTP = 1.8681
max_tp = 1.8228
>> [dBTP, max_tp] = max_true_peak(4)
dBTP = 2.0940
max_tp = 2.0429
>> [dBTP, max_tp] = max_true_peak(5)
dBTP = 2.0129
max_tp = 1.9638
>> [dBTP, max_tp] = max_true_peak(6)
dBTP = 2.0940
max_tp = 2.0429
>> [dBTP, max_tp] = max_true_peak(7)
dBTP = 2.0526
max_tp = 2.0025
>> [dBTP, max_tp] = max_true_peak(8)
dBTP = 2.0940
max_tp = 2.0429
Probably results could be closer to each other with better FIR design.

EDIT: Yes, it looks like better FIR design improves the match between the two implementation methods:

>> [dBTP, max_tp] = max_true_peak(4)
dBTP = 2.0940
max_tp = 2.0977

I did not found many usable test files (Octave seems to not accept most of those files Google search brought into hands). Any links?
Last edited by juha_p on Sun Sep 04, 2022 5:31 am, edited 3 times in total.

Post

juha_p wrote: Wed Aug 31, 2022 1:08 pm

Code: Select all

>> [dBTP, max_tp] = max_true_peak(2)
dBTP = 2.0940
max_tp = 2.0429
>> [dBTP, max_tp] = max_true_peak(3)
dBTP = 1.8681
max_tp = 1.8228
>> [dBTP, max_tp] = max_true_peak(4)
dBTP = 2.0940
max_tp = 2.0429
>> [dBTP, max_tp] = max_true_peak(5)
dBTP = 2.0129
max_tp = 1.9638
>> [dBTP, max_tp] = max_true_peak(6)
dBTP = 2.0940
max_tp = 2.0429
>> [dBTP, max_tp] = max_true_peak(7)
dBTP = 2.0526
max_tp = 2.0025
>> [dBTP, max_tp] = max_true_peak(8)
dBTP = 2.0940
max_tp = 2.0429
Probably results could be closer to each other with better FIR design.
It looks like all the even oversampling factors give the same result, which suggests perhaps the test-file has a true-peak exactly halfway between two samples? In a sense, when the true peak falls exactly at an oversampled value that's the best case and the worst-case for any given resampling factor would be when the actual peak is exactly half-way between samples after oversampling.

The strategy of simply taking maximum oversampled peak is always going to have "large" variation depending on where the true peak falls and better FIR design will not improve this... so be careful about what exactly it is that your test is measuring. :)

Post

Vladgsound's blog post where those two test files I used above come from includes calculations for theoretical values. As I'm not good with higher math, any change to open those calculations here ... maybe in python, octave or C++ so one could build few more test situations for deeper testing?

Here's one 1s audio file with slight over 1.0 value (by Audacity) ... I get +0.00004276 (dBTP) with an off-line implementation which uses (long) FIR low-pass filter in up-sampling process and binary search method in locating the highest peak value with help of a piecewise cubic spline interpolator (this is very slow in Octave and without binary search, procedure would have been too slow for anyone to wait finish). My previous implementations found in my post both returned 0dBTP (which is OK of course).

Post Reply

Return to “DSP and Plugin Development”