Pre-Conditiong for Low-Bitrate WMA 9 Codec

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

JCJR wrote:Just sayin, if an encoder has trouble with noisy high freqs (such as drums, cymbals, hand perc), or with transients in general, then one might suppose that we could make it easiest on the encoder to pre-filter out noisy high freqs and to minimize the transients? Of course such pre-processing would make many kinds of music sound worse before we even feed it to the codec, but it might help prevent the codec encoding chirpies, transient distortions, and other ear-annoying artifacts? On the theory that clean lowfi might be a more pleasant listen than nasty lowfi! :)
This sounds exactly like what I'd been thinking. I was unable to describe it as you did, but that is what I'm after.

Is this something that can be done? Or is this equally as difficult due to the aforementioned hurdles in this thread?

Is what JCJR suggesting doable?

Post

PurpleSunray wrote: 1) Understand how the encoder processes the signal. That's the psychoacoustic modell.
For mp3 or aac you can find it on internet, for WMA you can not.
Thanks. When I was working on it long ago I never saw descriptions about how WMA works but didn't know if such info had become available since.

As best I recall I used WMA 9.x sdk. Two-pass encode was part of the spec. Maybe remembering wrong, but think that two-pass encode might not always have been available depending on encode settings, and the availability was discoverable by querying the codec after instantiating the directx filter graph, and before pumping data thru it. It could be wrong recollection. At worse case could look up the old code but I don't need to know that bad. :)

Didn't pay much attention to low bitrates except to make sure they would work.

For my company's purposes, WMA advantages included--

_1_ Fairly good fidelity vs file size
_2_ Fast decode and it was possible to do it "in-memory" without taking months of study and brain damage to figure out how to write the code.
_3_ Excellent seek accuracy and good seek speed. We were making on-the-fly auto-assembled composite tracks by seeking into lots of WMAs and perhaps only decoding a few seconds of each one, at specific locations, then optionally pitch and time stretching all the little pieces. Inaccurate seek location (or slow seek within the compressed streams) were deal-breakers.

M4a had the same advantages on Mac but I never found windows code to do in-memory decode and accurate seek of M4a on windows. And Mac was quite slow with WMA. File-based shell programs were no substitute. Because no one was selling code (and patent rights were up in the air anyway, IOW maybe we couldn't buy a windows AAC DLL because no one owned the rights to sell one). We wedged windows quicktime calls to do some M4a conversions but it wasn't fast enough.

MP3 I like a lot but seek was not always accurate enough with MP3.

So we mainly used WMA on winders and M4a on Mac. Each working "about as good" on its native platform.
Last edited by JCJR on Mon Nov 20, 2017 4:20 am, edited 1 time in total.

Post

genie40204 wrote:
JCJR wrote:Just sayin, if an encoder has trouble with noisy high freqs (such as drums, cymbals, hand perc), or with transients in general, then one might suppose that we could make it easiest on the encoder to pre-filter out noisy high freqs and to minimize the transients? Of course such pre-processing would make many kinds of music sound worse before we even feed it to the codec, but it might help prevent the codec encoding chirpies, transient distortions, and other ear-annoying artifacts? On the theory that clean lowfi might be a more pleasant listen than nasty lowfi! :)
This sounds exactly like what I'd been thinking. I was unable to describe it as you did, but that is what I'm after.

Is this something that can be done? Or is this equally as difficult due to the aforementioned hurdles in this thread?

Is what JCJR suggesting doable?
Hi genie

Just wild guesses. I found one old source file. My current PC's don't have the old dev IDE installed so hunting thru lots of files isn't much fun. Here is one version of the old WMA writing--

The '//' comments were not necessarily my opinion at the time, more likely copying Microsoft documentation propaganda. IMO tis seriously doubtful for instance that "FM Radio Quality" is really FM Radio quality unless reception conditions are not optimal. Maybe the FM Radio Quality of a 10 dollar pocket radio 50 miles away from a 1000 watt college station. :)

Samplerates are not as bad as I recall if you don't need to go below 32 kbps. There were several ways to set up encode formats but the method below associated opaque data blocks made with some old funky windows tool to store GUID specifications which the codec would use. I'm reasonably sure these WMA 9 specs and code still work on Win 10 but Win10 may have better ways to do it. Someone could study current MS docs or do some code-testing to verify input samplerates for latest WMA capabilities.

So anyway it is silly word salad of little interest except denotes some possible bitrates and the associated necessary input samplerates in order for the codec to succeed.

Code: Select all

(StereoOrMono, kbps, data size compression ratio, kilobytes per minute data size, IOW 5 minutes at 46 KB/Min might turn out about 230 KB file size, input samplerate)

//Low Bit Rate Voice (Mono, 6.5 Kbps, 192:1, 46 KB/min, 8000)

//FM Radio 28.8 K Modem (Mono, 28.8 Kbps, 68:1, 151 KB/min, 32000)

//FM Radio 28.8 K Modem (Stereo, 28.8 Kbps, 68:1, 152 KB/min, 22050)

//Higher Quality 56 K Modem (Stereo, 32 Kbps, 42:1, 242 KB/min, 32000)

//Near CD quality (Stereo, 48 Kbps, 28:1, 360 KB/min, 44100)

//CD Quality (Stereo, 64 Kbps, 22:1, 477 KB/min, 44100)

//Better than CD quality (Stereo, 96 Kbps, 14:1, 712 KB/min, 44100)

//Better than CD quality (Stereo, 128 Kbps, 11:1, 946 KB/min, 44100)
In other words, though WMA 10 might have different ways of working, if you want stereo 32 kbps then the WMA 9 codec probably expects an input file at 32 k samplerate. Or for stereo 28.8 kbps, the codec wants a 22.05 k samplerate input file. A tool capable of accepting a 44.1 k, 48 k, or 192 k file and spitting out a 28.8 kbps WMA, on WMA 9, PROBABLY has to temporarily samplerate convert to 22.05 k samplerate before feeding the codec, but it is a wild guess based on old information.

The nyquist, the highest allowable frequency in an audio file, is half of the samplerate. When samplerate converting down, frequencies higher than the new samplerate must be strongly filtered out or you get aliasing. Aliasing is ugly even if you don't subsequently feed the audio into a codec.

Possibly a very high quality samplerate conversion would do a better job than whatever process is wrapped up inside whatever encode tool you use. On the other hand, possibly the samplerate conversion inside the encode tool is excellent already and it would be wasted effort to try to do it better.

Lowpass filters can't be infinitely steep so usually there is a filter guard band. For instance the nyquist for 44.1 k samplerate is 22.05 kHz, but usually the lowpass filtering begins at 18 or 20 kHz or lower, so that the attenuation has become rather steep when we reach the 22.05 kHz nyquist.

The same-sized guard band for a 22.05 k samplerate would begin filtering around 9 or 10 kHz and be a very deep cut by the time it reaches 11.025 kHz.

An experiment you could make with a good stereo editor program, just to see if it makes any improvement-- Try to find a good quality steep lowpass filter plugin. A good "brickwall" filter. I am not familiar with current plugins and can't recommend one to try.

Rather than samplerate converting your source files, stay at 44.1 k or 48 k or whatever you usually use. Try brickwall filtering the file before low bitrate encode to see if it helps. You could try 10 kHz and then 5 kHz or lower. Regardless the quality of your encode tool samplerate conversion, if you wipe out high frequencies before sending to the encode tool then maybe it will help and maybe not. Needs experiment to determine.

At the very least, frequencies above nyquist need wiping out, but maybe a little lower than nyquist could help.

I mentioned "noisy high frequencies" such as cymbal, afuche, tamborine or whatever. POSSIBLY a codec could do better with (below nyquist) "pitched high frequencies". Or possibly the codec would mess up just as bad on pitched high freqs. That would be an interesting experiment.

If codecs at low bitrate really can do better with pitched high freqs, then it might be possible to locate or write a plugin somehow able to discern between noisy highs vs pitched highs, attenuating one and not the other. Merely a possibility, perhaps not a likely possibility.
Last edited by JCJR on Mon Nov 20, 2017 5:57 am, edited 1 time in total.

Post

A possible way to experiment to find if transient manipulation could make low bitrate encode more listenable--

There are many transient processor plugins. I've not experimented with such but kept meaning to maybe write one sometime, probably won't get around to it.

Compressors and limiters can reduce level of LOUD transients above your preset threshold, but a well-done transient processor ought to be able to exaggerate or minimize both loud transients in loud parts of the music, and also quieter transients in quieter parts of the music. That is because the "typical" transient processor apparently uses a "floating threshold" of the average amplitude of the music "right now". So if we are in a loud section of music, a transient is a section even louder than usual. But if we are in a quiet section of music, a transient is whatever short sections happen to be louder than the current quiet level.

You could try reducing transients a little, and then more so, and then a lot, and encode the various processed versions to discover whether reduction of transients manages to make low bitrate encode less annoying to the ear. No way to know without experimentation.

I'm wildly guessing that high-mid and high frequency transients might be more problematic to a low bitrate encode, and low frequency or mid frequency transients would be less problematic to the codec. If that is so, then a frequency-specific transient processor might be a better tool, only reducing transients in high-mids and highs where the codec tends to distort and splatter. There are many transient plugins and surely some of them are multi-band but I don't know what is available and which ones are any good.

There may be that capability in some Dynamic EQ plugins, though perhaps the typical Dynamic EQ would be a frequency selective compressor rather than frequency selective transient processor. I think it possible that a low bitrate encode that wants to splatter transients might splatter quiet transients as well as loud transients, so a frequency selective transient processor might be the thing.

However, a broadband transient processor might be good enough for initial experimentation, just to find out whether modifying transients makes enough improvement to bother with. It might turn out entirely unproductive, can't know without test.

Post

BertKoor wrote:Imho it is far more economical to:
1. Invest in storage and bandwith. Both are insanely cheap nowadays. Do the math: how much storage and bandwith can you get for, say, $1000? And how much TB do you really need?
2. Switch to a better encoder than WMA9. I'd recommend AAC which offers better quality than MP3 with lower bitrates.
We have dived into the encoding question, not with much concrete results.

What about the other option: make a compromise in bandwith & storage space. Could you elaborate on why 32kbps @ 32kHz sampling rate is a hard requirement? What would the real life consequenses be when the bitrate & sampling rate were somewhat higher, but elevating the need for preprocessing to counteract encoding artefacts?
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post

Got interested in this topic.. but I fail to understand the need for anything to be compatible with Windows 95? Please explain what this is about... :)

There are better options if you got old hardware. Alot of linux distributions, even Win XP Embedded (still got support).

Connecting Win95 with the Internet is a big no-no.. so streaming is out of the question, right? :)

Post

cnt wrote:Got interested in this topic.. but I fail to understand the need for anything to be compatible with Windows 95? Please explain what this is about... :)

There are better options if you got old hardware. Alot of linux distributions, even Win XP Embedded (still got support).

Connecting Win95 with the Internet is a big no-no.. so streaming is out of the question, right? :)
It is about the receiver-side, not the sender-side.
i.e. if genie40204 runs a webradio and sees that 90% of his listeners are listening WMA9 via WMP6 (very unlikley :P) , it is no good idea to drop that codec. You will lose 90% of your current listeners.

(ofc you can update the encoder, so the 10% with new WMP which can handle WMA10 get better quality)

Post

JCJR wrote:A possible way to experiment to find if transient manipulation could make low bitrate encode more listenable--

There are many transient processor plugins. I've not experimented with such but kept meaning to maybe write one sometime, probably won't get around to it.

Compressors and limiters can reduce level of LOUD transients above your preset threshold, but a well-done transient processor ought to be able to exaggerate or minimize both loud transients in loud parts of the music, and also quieter transients in quieter parts of the music. That is because the "typical" transient processor apparently uses a "floating threshold" of the average amplitude of the music "right now". So if we are in a loud section of music, a transient is a section even louder than usual. But if we are in a quiet section of music, a transient is whatever short sections happen to be louder than the current quiet level.

You could try reducing transients a little, and then more so, and then a lot, and encode the various processed versions to discover whether reduction of transients manages to make low bitrate encode less annoying to the ear. No way to know without experimentation.

I'm wildly guessing that high-mid and high frequency transients might be more problematic to a low bitrate encode, and low frequency or mid frequency transients would be less problematic to the codec. If that is so, then a frequency-specific transient processor might be a better tool, only reducing transients in high-mids and highs where the codec tends to distort and splatter. There are many transient plugins and surely some of them are multi-band but I don't know what is available and which ones are any good.

There may be that capability in some Dynamic EQ plugins, though perhaps the typical Dynamic EQ would be a frequency selective compressor rather than frequency selective transient processor. I think it possible that a low bitrate encode that wants to splatter transients might splatter quiet transients as well as loud transients, so a frequency selective transient processor might be the thing.

However, a broadband transient processor might be good enough for initial experimentation, just to find out whether modifying transients makes enough improvement to bother with. It might turn out entirely unproductive, can't know without test.
Thanks for that elaborate response.

Did some tests with a few transient cprocessors (broad + multi band) .

Nothing seemed to make a noticeable difference on my test songs/audio. I'm not sure if this is due to my lack of ability/understanding with these plugins, though.

Since it seems no one is willing to undertake such a project - does anyone else have any suggestions?

Thanks.

Post

genie40204 wrote:does anyone else have any suggestions?
Sure, but you have not responded to it:
bertkoor wrote:What about the other option: make a compromise in bandwith & storage space. Could you elaborate on why 32kbps @ 32kHz sampling rate is a hard requirement? What would the real life consequenses be when the bitrate & sampling rate were somewhat higher, but elevating the need for preprocessing to counteract encoding artefacts?
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post

genie40204 wrote: Thanks for that elaborate response.

Did some tests with a few transient cprocessors (broad + multi band) .

Nothing seemed to make a noticeable difference on my test songs/audio. I'm not sure if this is due to my lack of ability/understanding with these plugins, though.

Since it seems no one is willing to undertake such a project - does anyone else have any suggestions?

Thanks.
Oh, well, can't know without test. Did you try brickwall lowpass filtering at 10 kHz and also maybe 5 kHz? Maybe that wouldn't help either, dunno. Wild guesses.

What is the nature of the audio? Music? Speech? Classical? Rock? EDM? Folk?

Does a 64 kbps encode sound "good enough"? If so, maybe just bite the bullet and use the higher bitrate. As earlier mentioned, to my ear 128 kbps is not "always good enough" but if bandwidth has to be skimpy then sacrifices must be made somewhere. :)

Post Reply

Return to “DSP and Plugin Development”