True Peak Detection

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Okay, so you neither understood the article nor even bothered to read it.

Slow clap.


"My" solution couldn't possibly be worse. It depends entirely upon which filter is applied by the algorithm. It will converge at a rate determined by the root finding algorithm which is higher than log(N).

Therefore, the algorithm I linked to is better than oversampling which demonstrates logarithmic convergence except in specific circumstances (described in the article) in which convergence although higher doesn't come into play.


Yes, you can very well demonstrate specific ad-hoc circumstances in which the algorithm performs poorly. You will have a lot of trouble doing so without first reading the article.



The only condition under which a naive "oversampling" re-sampling algorithm and measurement of the max level would provide a more accurate result would be if the phase of the peak happened to be equal to 1/N.

In all other cases, utilizing the same interpolating filter the root-finding algorithm need only position itself anywhere between 1/N and the true peak level in order to be more accurate.

I would also argue that once you are up to 8 iterations or similar the root-finding algorithm will be vastly more accurate than the naive re-sampling algorithm, to the point that an equivalent result could likely be achieved with far fewer iterations. In the case of quadratic or even linear convergence given enough iterations it is impossible for the naive algorithm to produce a more accurate result unless convergence fails.

In order to disprove this statement you need only demonstrate that the convergence of the naive algorithm is greater than log(N), or that the convergence of the root-finding algorithm is worse.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.

Post

aciddose wrote:Okay, so you neither understood the article nor even bothered to read it.
Of course I didn't do more than a quick scan. Read my last post—I'm not developing a product, I don't care.

Anyway, it's clear now that you're having a one-sided argument about efficiency. I have no idea why you're addressing that at me—I never proposed a solution, I never said anything about efficiency. I made a comment about granularity only (the article you cited referred to it as how "course" [sic] the resolution is). My potential use case was for one-time offline processing—something that would take me a few minutes to code and for which I had zero need to make more computationally efficient. My point to the OP was that I considered that 8x would probably be overkill and I offered it as an upper bound (I had been in a recent discussion thread where people were arguing about needing 512x oversampling—for a different purpose, but nevertheless a factor that I thought was absurd).

PS—You seem to be on me about efficiency now, but earlier your argument was, "My point is that if a person considers oversampling a solution, they are probably taking the worst possible point of view right from the get-go." Again, your solution seems to be oversampling too, so I just don't get this comment.
My audio DSP blog: earlevel.com

Post

Yes, the efficiency and accuracy of the naive method you propose is the worst possible of any method.

If you don't disagree, why even respond to my post?

You've all along been saying "but... but, but butbutbut..." without simply admitting "yes, it is the worst possible implementation and no sane developer should ever even consider using it unless they have no other option. I've used it because I had no reason to invest in a more practical general-purpose implementation and the method is the cheapest to implement although not the most efficient."

That would make sense. You sort of said that while also doing some pointless hand-waving in what looks like an attempt to discredit my statement that this is the worst possible method.

It is the worst possible implementation. I won't change my position on this unless you demonstrate it to be false.

Given the log(N) convergence which I see no way to work around, I can't see you ever demonstrating it to be false, nor can I see why you'd want to. What is the purpose of disagreeing with my statement of fact?

I have no problem if you or anyone else wants to use the method. It makes perfect sense... assuming you don't have any need for a more efficient or accurate method and can't justify the investment of effort to produce one.

That said, I won't ever accept someone arguing that without admitting it is the worst possible implementation. The first step to accepting a work-around or avoidance of effort is to admit that you accept the consequence, that is in this case that the algorithm is the worst possible.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.

Post

earlevel wrote:Again, your solution seems to be oversampling too, so I just don't get this comment.
You're not looking at the issue correctly. You seem confused.

There is a very important difference between "oversampling" as you call it (re-sampling to a fixed grid) vs. sampling at the arbitrary points along a continuous function as determined by a root-finding algorithm.

The only thing these two share in common is the continuous function which is sampled. The difference is in the method of sampling.

Selecting points at a fraction of the width of the original data points ("oversampling") is the most naive way to select new points to sample and produces the worst possible accuracy and efficiency unless the peak level happens to land exactly on one of those fixed grid points.

Due to the fact the true peak of the function will the vast majority of the time not land on one of these integer fractions, any method that allows sampling without being fixed to a predetermined grid is potentially going to produce a more accurate result.

It is not correct to view sampling as "oversampling" making this sort of bizarre generalization. Using your logic any time we sample any function we could call it "oversampling". That would make the term uselessly generic.

In the case of re-sampling to a higher rate, you could genuinely define the algorithm as "oversampling" due to the fact that you are increasing the number of samples taken without actually seeing any benefit from doing so. In other words, you might make all eight samples and none of these samples would be greater magnitude than the original sample you already had.

In the case of applying a root-finding algorithm, no such oversampling takes place. The starting estimate may be determined in numerous different ways and every sample taken from that point increases the accuracy of the result, so long as convergence does not fail. At best the algorithm used to sample the starting point may be considered "2x oversampling" although I would argue that the logic/rationale for this nomenclature does not make any sense.

For example given linear convergence each iteration of the root-finding algorithm produces a result twice as accurate as the last. Using an 8x re-sample, the maximum accuracy is 1/8th (1/N) and the accuracy increases sub-linearly at log(N), or in other words to linearly increase the accuracy requires a doubling of the number of samples and the cost of producing those samples.

With linear convergence each iteration doubles the accuracy. 2^8 yields 1/256 rather than 1/8. Under ideal conditions a root-finding algorithm should produce the same accuracy in three samples (2^3 = 8) rather than eight.

In reality however it is more likely you'll apply a root-finding algorithm with better than linear convergence, near quadratic convergence is common. In such a case accuracy might be vastly more than 1/8 after only three samples are taken.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.

Post

Image

Just inserting a little levity :wink: ...sorry, folks, for my part in extending this...
My audio DSP blog: earlevel.com

Post

So I looked at this a bit more yesterday.

Apologies such a long setup description-- Been working on a new "hobby" lookahead limiter to use on my old music, etc. This one examines each peak above thresh, finds max value and duration of each peak. Then it flatlines the envelope over the entire peak duration, using max sample value. It flatlines for about a half millisecond lookahead as well, then draws a half millisecond attack into the peak hold area. The times adjust upwards a bit according to peak amplitude and duration. It draws a release enve!ope twice as long as the attack time, so for instance if the attack + pre-hold is 1 ms total, the release would be 2 ms.

The basic idea is to have fairly small gain change distortion on attack and release, and no "short term" gain change distortion at all for the duration of each peak.

Before actual audio gain reduction, the lookahead envelope is further smoothed via IIR filter.The final IIR Smoother envelope attack has a time constant chosen to turn the initial linear attack segment into a sigmoid shaped attack, which reaches the max gain reduction exactly at each peak hold onset. The final smoothing envelope's default release time draws a sigmoid shaped release curve atop the lookahead envelope's linear release shape.

This alone seems fairly clean and transparent on my old mixes, up to 3 or 6 dB gain reduction. Obviously such short envelopes get real nasty at bigger gain reduction. So the final smoother IIR filter release is program adaptable so that the release can grow quite long on sustained heavy overs, and then recover to short release if the input audio gets quieter. Seems to heavy-limit fairly clean on the test songs I've tried, still experimenting with the algorithm.

****

So far, the only mixes I've tested were long ago analog mixed to dat with no clipping. I recently imported fresh copies from the old dats to computer float files via optical spdif. 44.1 K samplerate. Just sayin, there is probably nothing about the mixes extraordinarily bad quality or good quality. Maybe typical of home studio creations from that era.

For instance one test song, limiter set for -6 dB threshold and -1 dB ceiling, IOW 5 dB of makeup gain. Played from reaper with the limiter inserted on the track. On the output are voxengo span and a freeware true peak level meter which seems to work ok as best I can tell, written according to the EU specs.

Reaper master level meter (not a true peak meter) shows exactly -1 dB max peak as expected. Similarly, voxengo span measues exactly -1 dB max peak (not true peak either as best I know). The true peak meter also shows -1 dB sample peak, but over the entire song gets up to a max true peak of about +1.8 dB!

Investigating further, though there are multiple instances where the true peak is allowed to rise above the -1 dB ceiling, there are only 5 locations in the song which trigger true peak measurement greater than 0 dB. I located and placed markers on those 5 instances to look for visual evidence of what kind of wave shape characteristics would lead to almost +3 dB overshoots. Did not notice a smoking gun or common characteristics about the big ISP instances.

Surprisingly, the biggest ISP's do not even occur at the loudest peaks! The locations which trigger the biggest ISPs look and sound rather ordinary, The limiter grabs much bigger peaks no problemo. And because three meters agree that the sample peaks are no higher than -1 dB, seems more likely an envelope detection problem rather than a gain change bug in my plugin. If the plugin were outputting raw samples any hotter than -1 dB, the sample peak meters would notice.

Post

So a few completely inaudible ISPs leaking thru doesn't matter much to me, having no commercial aspirations for the music. But I read that big online streaming sites are beginning to adjust song playback level according to long term level and true peak level. If one publishes a song which peaks above -1 dB, supposedly more and more sites will automatically reduce the song playback level.

Dunno how stupid or "strict" these auto-level web playback codes are written. For instance if 99 percent of a 3 minute song peaks no higher than -1 dB, but five peaks are hotter than 0 dB, and one lonely peak goes up to +1.8 dB-- Is the auto-level algorithm so stupid or rigid that it will cut the entire song playback level by -2.8 dB because of a few samples worth of hot peaks?

****

So I wondered how shoddy of overampling one might be able to get away with, to properly limit ISP's. Try the simple cheap stuff first...

Read again about aciddose's recommended lanczos short sinc FIR. Paper pencil and calculator test, on a simple 45 degree phase shifted sine, frequency = fs/4

Lanczos interpolate 1/2 sample. A = 2, four tap FIR on values [-0.707, 0.707, 0.707, -0.707], slightly underestimates the expected result of 1.0, but the result is much closer than linear or cos interpolation.

Lanczos interpolate 1/2 sample, A = 3, 6 tap FIR on values [-0.707, -0.707, 0.707, 0.707, -0.707, -0.707], slightly overestimates the expected result of 1.0, but the overestimation is fairly small.

So first tried 6 tap lanczos oversampling X 2 in the limiter. This reduced the measured ISPs by 0.1 or 0.2 dB. Not very much improvement but a little bit.

Then tried 6 tap lanczos oversampling X 4. This reduced the measured ISPs so that no intersampling peak measured higher than 0 dB. As best I recall, at -6 dB thresh and -1 dB ceiling, the max true peaks measured up to about -0.5 or -0.4 dB. About a +0.6 max overshoot, compared to about +2.8 overshoot with no envelope oversampling.

I had wondered whether envelope oversampling would only be needed in vicinity of each peak-- Perhaps envelope oversampling need not be performed on below-threshold audio? However, trying the oversampling only on peaks did not work as good for avoiding big ISPs. Envelope oversampling all of the audio was much more effective to eliminate ISPs. I think the limiter sounds a bit better to the ear when oversampling the entire envelope. I'm not oversampling the audio chain. Only oversampling the envelope.

IIRC the ITU doc example 4X oversample FIR is 48 taps per sample per channel. The 4X 6 tap lanczos is only 18 multiplies and 15 adds per input sample per channel, though apparently not as effective.

Later will test a couple of lanczos variations, compare effectiveness. One obvious test would be 8 tap lanczos 4X oversampling, which would still only be 24 multiplies per input sample, IF such would significantly reduce limiter ISPs better than 4X 6 tap lanczos. Another option would try 8X oversample with 6 tap lanczos, 42 multiplies per input sample. Or 6X oversample with 30 multiplies per input sample. Would be interesting to know which option would be more effective-- Bigger oversampling or better oversample filters?
Last edited by JCJR on Thu Dec 10, 2015 1:44 am, edited 1 time in total.

Post

Another surprise, assuming I did the test correctly-- I added diagnostic code to determine statistically how often the interpolated samples are bigger than either of the endpoint samples. I applied the test only to over-threshold peak regions.There would be wide variation when the audio wave is steadily rising or steadily falling, so I only counted behavior within over-threshold peaks. Even within over-threshold peaks there would be lots of steadily rising or steadily falling line segments.

I collected 5 counts during peaklimiting action-- Whether sample[0] is biggest, or sample[1] is biggest, or the quarter interpolation, half interpolation, or three-quarter interpolation is biggest.

On the test songs, about 18 percent of samples near peaks have an interpolated sample which is bigger than both bracketing endpoint samples. I expected the half sample interpolation to have the biggest likelihood of being bigger than the endpoints. However, the quarter sample and three-quarter sample interpolations were about equally likely, and about twice as likely to have the biggest value, compared to the half sample interpolation.

Perhaps this shows some kind of half-point estimation flaw in a 6 tap lanczos. Or perhaps interpolations closer to the endpoint really are more likely to be the loudest interpolation, as compared to the midpoint interpolation.

I expected that interpolations closer to an endpoints, would tend to have values closer to the endpoints, and be less likely to be the biggest value out of the five values.

So maybe it is just a midpoint estimation problem with these very short FIR filters. If I further test 4X oversample stats with bigger lanczos kernels, will be interesting to see whether the stats on the three interpolations tend to flatten out to about equal probability, or even show statistical preference for the intersample midpoint.
Last edited by JCJR on Thu Dec 10, 2015 1:17 am, edited 1 time in total.

Post

JCJR wrote:But I read that big online streaming sites are beginning to adjust song playback level according to long term level and true peak level. If one publishes a song which peaks above -1 dB, supposedly more and more sites will automatically reduce the song playback level...
Hi, Jim. On "the loudness wars": Once it was just damaging to the music. Now it has become futile as well. If you crush the dynamics out of the music for the sole purpose of being louder, you will be reined in—you will give up the loudness advantage and have just squashed your music.

Look at this chart long enough to understand it; you'll see that, overall, a moderately compressed track (basically, taming hot transients) is the biggest win for modern providers, but the main point being that an overly squash track will simply be turned down for Apple, Spotify, and Youtube:

http://productionadvice.co.uk/online-loudness/
My audio DSP blog: earlevel.com

Post

Thanks Nigel I agree. Saw that article awhile ago. Its great.

A couple years ago I acoustic treated and improved the playback system in a new home office, then spent some months listening to lots of music to "calibrate the ears". So after awhile it occurred that maybe some of the old stuff could be made to sound a bit better. Played with several sets of tools, experimenting on the songs, made several sets of rather bad sounding remasters.

Got tired of testing tools, trying to decide if they sound good or bad. Generic EQ tools seem usable. The cleaner and more neutral an EQ, the better IMO. But finally decided it would be faster to write a compressor and a limiter which I might like when applied to MY old songs. Rather than extensively test a bunch of plugins to decide which third party compressor and limiter sounds best. Which takes time and is something of a guessing game.

The raw mixes tend to have a PLR of 17 or greater. They do seem to sound solid and about right with a PLR in the ballpark of 12 to 14. Its rock'n'roll after all. At that level they still retain some dynamics, not a complete flatline in an audio editor.

So thunk to myself, " Self you probably need to narrow the dynamics about 4 dB". Wrote a lookahead RMS compressor designed for minimum intermod distortion. Heavy audio frequency envelope smoothing, with potentially fairly fast recovery. Or at least fairly fast for a slow as hell rms compressor. :)

Set ratio 1.2:1 or maybe up to 1.5:1, and transparently adjust out about 2 dB of dynamic range. Hopefully level the song 2 dB without hearing it work or adding noticeable distortion.

Then wrote the limiter to hopefully invisibly take out another 2 dB and get in the ballpark of 12 to 14 PLR without overly squashed sound or audible added distortion. Wish the old songs had been recorded cleaner. Can't afford any extra added distortion. PLR of 12 to 14 with a -1 dB true peak target.

The compressor and limiter do those modest jobs fine already as far as I can tell, ON MY SONGS. Except for occasional inaudible overshoot on true peaks. Been recreational tweaking them to sound as good possible under more abusive conditions, just for the heck of it. Might open source the plugins after some polishing. Just what the world needs, more compressors and limiters. :)

Testing with RMS compression of 2 dB, make up gain near 0 dB, feeding the limiter -6 dB thresh and -1 dB ceiling, outputs a fairly "clean sounding" PLR of 8 or 9, lots hotter than the intended use. Been testing down to -24 dB thresh. Pump city, but fairly clean.

The last limiter I made long ago, would not pump on heavy limiting. At heavy limiting it raised release long enough to become an auto level effect. That old limiter had several ms of lookahead and 10+ ms at its quickest-recovery peak limit. I recently tried it on my old songs. The old limiter wasn't invisible with light limiting. It tended to make audible pumping even with very light limiting. Hadn't noticed before, on other types of music I had tested against. Doh. When writing that old one I expected its typical use would be more than just a couple of dB limiting.

So anyway am gonna leave in the pumping at extreme settings with the new one. As long as it has clean pumping. Sometimes pumping sounds kewl.

It is not likely that I'll ever bother to upload songs to youtube or whatever. Just the principle. If a fella uploads a well behaved PLR 13 song, except maybe a handful of samples shoot up to +2 dB True Peak, is the service gonna attenuate his song -3 dB because of a few inaudible "virtual" overs? :)

Post

None commented/answered my question left in 1st page of this thread?

FYI, Orban's Loudness Meter V 2.8.2 (freeware) includes full support for BS.1770-3 and EBU R 128, logging, manual mode (start/stop), and oversampled peak measurement that accurately indicates the peak level of the audio after D/A conversion.

Image

http://www.orban.com/meter/

Post

juha_p wrote:None commented/answered my question left in 1st page of this thread?

FYI, Orban's Loudness Meter V 2.8.2 (freeware) includes full support for BS.1770-3 and EBU R 128, logging, manual mode (start/stop), and oversampled peak measurement that accurately indicates the peak level of the audio after D/A conversion.

Image

http://www.orban.com/meter/
Thanks for posting that paper. I read it and all the links provided by others. Enjoyed reading them. The paper you linked appears compute-intensive for a peak limiter except perhaps for a perfectionist. :)

Looked at the orban page. Looks good, but at the moment a vst true peak meter is more convenient. My focusrite interface supports asio loopback and probably would work with the orban program. Might try it sometime.

Reviewing the thread, perhaps the quickest path to processing my songs with my limiter-- High quality upsample the 44.1K files to 176.4 K, render the processed files, downsample back to 44.1K, and done.

Could add a high quality resampler and decimator inside the limiter, but seems a bit of overkill. Or maybe not.

The polyphase halfband filter plot posted by Max M looks good in amplitude and phase. When I used polyphase halfband filters in the past they were fairly fast and had good audio quality to the ear so far as I could tell. Maybe that would be "better performing" than the 48 tap ITU example FIR, for not outrageous computational cost.

If, due to high freq phase shift, a polyphase were to slightly overshoot the peak measurement-- I wouldn't mind slight overshoot in a limiter envelope detector. Underestimation would be the bigger concern. If the limiter happened to clamp some random peak down to -1.2 dB rather than the -1 dB ceiling, I wouldn't care. So long as it keeps all peaks below the ceiling.

Unless oversamplig/decimation of the actual audio path were discovered necessary in order to perform clean-sounding modest limiting, would rather not do it. Unless limiting at the original samplerate makes bad gain-change artifacts, NOT oversampling the audio path seems to run less risk of messing up the audio somehow.

Was just exploring how cheap of a method might get the job done on practical music. The cheap 4X, 6 tap lanczos would perform real bad on pathological signals as described in the izotope link. The filter isn't long enough to evaluate a long burst at fs/2 frequency. However it managed ON MY TEST SONGS to keep overshoot below 0 dB, given a -1 dB ceiling. It would allow greater than 0 dB overshoots on any ceiling higher than about -0.6 dB.

Might try a few more cheap variations.

Post

White noise is a great signal to test intersample peaks.

Post Reply

Return to “DSP and Plugin Development”