Linear or nonlinear filter for speech processing?
-
- KVRian
- Topic Starter
- 607 posts since 6 Mar, 2005 from USA
I'm building a filter for speech, and I'm trying to decide if linear or non-linear filters are better, given my need to save large numbers of pre-computed coefficients. There can be a delay of a second or so, so if I went linear I could easily opt to make it zero phase, and that's my first instinct. But why is there so much bias against a using an IIR filter, like a Butterworth, that could potentially have a similar gain magnitude response with a far smaller order?
The standard answer for linear is "in the time domain a linear phase filter preserves the signal morphology" and "zero phase in addition to linear phase advantages loses the N/(2*Fs) second delay" but so what? My ear doesn't hear in the time domain; it hears in the frequency domain, with cochlear innervation at approximately harmonically-related frequencies. Some say "but IIR filters exhibit lots of ringing in their impulse response" but that's either an artifact of a low-order filter or possibly finite-precision arithmetic - a higher order filter will have close-to-ideal response so, if implemented with double precision, won't ring any more or less than a linear FIR with similar gain-magnitude performance.
My interest in IIRs is that, because of hardware architecture limitations, I'll need to pre-compute a large number of filter coefficients, and that would take a lot less space for an IIR implementation.
The standard answer for linear is "in the time domain a linear phase filter preserves the signal morphology" and "zero phase in addition to linear phase advantages loses the N/(2*Fs) second delay" but so what? My ear doesn't hear in the time domain; it hears in the frequency domain, with cochlear innervation at approximately harmonically-related frequencies. Some say "but IIR filters exhibit lots of ringing in their impulse response" but that's either an artifact of a low-order filter or possibly finite-precision arithmetic - a higher order filter will have close-to-ideal response so, if implemented with double precision, won't ring any more or less than a linear FIR with similar gain-magnitude performance.
My interest in IIRs is that, because of hardware architecture limitations, I'll need to pre-compute a large number of filter coefficients, and that would take a lot less space for an IIR implementation.
- KVRist
- 294 posts since 1 Apr, 2009 from Hannover, Germany
What do you mean by "filter for speech"? What are you trying to achieve exactly?
-
- KVRian
- Topic Starter
- 607 posts since 6 Mar, 2005 from USA
Filtering speech recordings. I'm mentioning that it is speech to indicate that certain types of processing artifacts would make it unintelligible, but I'm questioning the general consensus that IIR filters are bad for this because the non-constant group delay would make it appear morphologically different on an oscilloscope. My ears work in the frequency transform domain; I hear a pure sinewave as a single tone, not as a continuously variable warble as on an oscope, so I'd think the magnitude of the frequency domain behavior would be more important than the relative phases of the component harmonics.
-
- KVRian
- 1000 posts since 1 Dec, 2004
Some processes used in speech processing do use IIR filters. For instance, "LPC" speech compression uses math to find the best-fit IIR filter for each block. A lot of classic speech synthesis algorithms (such as the one used by Stephen Hawking) use IIR filters.
A lot of the natural processes that cause filtering (such as sound bouncing around your throat to produce vowels) produce a result that is much closer to minimum-phase filtering than linear-phase filtering. IIR filters cause minimum-phase filtering as well, so they can sound more natural than linear-phase filtering in some applications.
IIR and linear-phase filters preserve different parts of the signal integrity. Linear-phase filters preserve the phase relationships between harmonics when there is a volume differential. IIR filters preserve edges: if your waveform has a jump/impulse/slope-change, it will be preserved in IIR filtering (with some wiggling after the fact as the filter does its job) - inversely, the linear-phase filter can easily change this edge into a sum of wiggling waveforms, with side-effects such as pre-echo. This is relevant to voice, because you can clearly see this edge-followed-by-wiggling effect in vocal waveforms.
A lot of the natural processes that cause filtering (such as sound bouncing around your throat to produce vowels) produce a result that is much closer to minimum-phase filtering than linear-phase filtering. IIR filters cause minimum-phase filtering as well, so they can sound more natural than linear-phase filtering in some applications.
IIR and linear-phase filters preserve different parts of the signal integrity. Linear-phase filters preserve the phase relationships between harmonics when there is a volume differential. IIR filters preserve edges: if your waveform has a jump/impulse/slope-change, it will be preserved in IIR filtering (with some wiggling after the fact as the filter does its job) - inversely, the linear-phase filter can easily change this edge into a sum of wiggling waveforms, with side-effects such as pre-echo. This is relevant to voice, because you can clearly see this edge-followed-by-wiggling effect in vocal waveforms.
- KVRAF
- 7888 posts since 12 Feb, 2006 from Helsinki, Finland
I want to point out that a linear or non-linear filter means something very different from linear or non-linear phase. Strictly speaking the whole transfer function (including the phase) is a concept that only really makes sense in the first place for linear shift-invariant (also known as "time-invariant" when sampling over time) filters.