Formant synthesis
-
- KVRer
- 13 posts since 8 Mar, 2020
OP wants to understand formants. The information I posted simply explains what they are and what they mean. And the information in that chart, if followed carefully, can be used to create very realistic human-like vowel sounds. Of course if realism is your goal (OP didnt ask about it) additional techniques can make the sound even more natural: adding jitter to the pitch, reverb to create space, pitch envelopes to mimic vocal inflections, chorusing to create an ensemble. But the topic at hand isn't realistic vocal simulation. We're talking about understanding and controlling formants.
The chart provides everything you need to know to recreate basic vowel shapes. If you start with a saw wave it will come out still sounding like a saw wave with formants. The human vocal cords dont produce a saw wave, they produce a pulse wave, which is why I mentioned starting with a pulse wave. If you start with a pulse wave and follow the specs in the chart exactly (including bandwidth and amplitude levels for your babdpass filters) it will come out sounding like a human vowel sound.
Of course it goes with out saying the information in the chart is only a starting point, and obviously one can go from there and add additional complexity, like modulating the formant frequencies, to make the sounds more interesting.
When I get around to it Im going to post here about varieties of formant-producing granular synthesis that offer additional capabilities beyond what can be done with bandpass filters.
The chart provides everything you need to know to recreate basic vowel shapes. If you start with a saw wave it will come out still sounding like a saw wave with formants. The human vocal cords dont produce a saw wave, they produce a pulse wave, which is why I mentioned starting with a pulse wave. If you start with a pulse wave and follow the specs in the chart exactly (including bandwidth and amplitude levels for your babdpass filters) it will come out sounding like a human vowel sound.
Of course it goes with out saying the information in the chart is only a starting point, and obviously one can go from there and add additional complexity, like modulating the formant frequencies, to make the sounds more interesting.
When I get around to it Im going to post here about varieties of formant-producing granular synthesis that offer additional capabilities beyond what can be done with bandpass filters.
-
- KVRian
- 1189 posts since 11 Jun, 2019
As much as I know the Term "Formants" is more generalistic. "Accoustic Energy above the Fundamental"? - so Synths produce Formants anyway. Unfortunately the OP didn´t specify his Interest any further, so we both can just guess what he might have meant. Right? We might even ask "what´s his Problem now?".
Maybe setting the Frequencys and Bandwiths. The Relations? Maybe Vowels? Creating vocalistic Synth Sounds? I´d guess he stumbeled on this Topic because he is interested in "humanoid Sounds". I call them VOX.
Aah = X and Ooh = Y are i. m. O. just Part of a theoretical Background that is maybe interesting to know, but practically nearly irrelevant. The Result will be unmusical and you´ll use other Options to shape it to something you might eventually want to use. Most SDs end up with "Something soaked in Reverb" or "somehow humanoid". An other Problem are the Possibilities of generalistic Synths. You hopefully know which Problems/Restrictions occure when you set up a 3BP-VCF Stack.
You´ll better use a Vowel Filter. They usually deliver better Results. "The Orb" even letes you set the Frequencys - but there´s not much to "understand" (?) about Formants with it.
Practical Approach: Vowel Filter - Reverse Engineering - get a Feeling for other relevant Parameters that shape a pleasant/musical Sound. I´m not talking about Realism - I´d better search for a Singer.
As said before. Knowing about "Ooh = XY" is just Theory. Most Synths can´t even get close and the Result will be "somehow humanoid". I´d better look at the resulting Waveform. Or the ones that have been used in that pleasant Patch. What ever. We all know that knowing about Scales doesn´t make you a Songwriter... and me, personally, I ended up with completely different Approaches and am currently exploring the various Possibilities of IRs.
But anyway: I´m very interested in your Granular Synthesis Examples. You could maybe also tell something about the Creation/Generation. I frequently google for Synth VOX Examples to get inspired...
Maybe setting the Frequencys and Bandwiths. The Relations? Maybe Vowels? Creating vocalistic Synth Sounds? I´d guess he stumbeled on this Topic because he is interested in "humanoid Sounds". I call them VOX.
Aah = X and Ooh = Y are i. m. O. just Part of a theoretical Background that is maybe interesting to know, but practically nearly irrelevant. The Result will be unmusical and you´ll use other Options to shape it to something you might eventually want to use. Most SDs end up with "Something soaked in Reverb" or "somehow humanoid". An other Problem are the Possibilities of generalistic Synths. You hopefully know which Problems/Restrictions occure when you set up a 3BP-VCF Stack.
You´ll better use a Vowel Filter. They usually deliver better Results. "The Orb" even letes you set the Frequencys - but there´s not much to "understand" (?) about Formants with it.
Practical Approach: Vowel Filter - Reverse Engineering - get a Feeling for other relevant Parameters that shape a pleasant/musical Sound. I´m not talking about Realism - I´d better search for a Singer.
As said before. Knowing about "Ooh = XY" is just Theory. Most Synths can´t even get close and the Result will be "somehow humanoid". I´d better look at the resulting Waveform. Or the ones that have been used in that pleasant Patch. What ever. We all know that knowing about Scales doesn´t make you a Songwriter... and me, personally, I ended up with completely different Approaches and am currently exploring the various Possibilities of IRs.
But anyway: I´m very interested in your Granular Synthesis Examples. You could maybe also tell something about the Creation/Generation. I frequently google for Synth VOX Examples to get inspired...
-
- KVRer
- 15 posts since 27 Jun, 2019
Best resource I’ve seen for formant synthesis is the great Sound on Sound synth secrets: https://www.soundonsound.com/techniques ... -synthesis
-
- KVRian
- 629 posts since 15 Jun, 2017
Formants can be described as peaks in a spectrum of a certain height (amplitude) and width (sort of Q), at a fixed relative distance of each other ("harmonic ratio's") that identify a certain sound (e.g. a specific vowel).
You will find tables of the relative amplitudes and ratio's of human voice vowles in many sources (not synthesis related only).
As someone already pointed out: Gordon Reid's excellent SoundOnSound Synth Secrets is one of them. https://www.soundonsound.com/techniques ... -synthesis
The thing is that for many natural sounds (being vocals, but also a guitar) there generally is a distinction between the driver/exciter and the resonator. The resonator characteristics are independent of the driver exicter. Something that is not true for something like a oscillator/sample/recording. This is why you get the "Smurf" sound when you pitch up a voice. It is more like the carrier/modulator combo's of Amplitude or Frequency Modulation where you would have the ratio's at a fixed frequency. Or the way a Vocoder works (based on feeding sound into a series of bandpass filters).
Using bandpass filters is one way to achieve the peaks (to emulate a resonator), by filtering out unwanted frequencies. A more "natural" approach is the use of the actual resonator approach. Generally the way to go is "Physical Modeling".
Here are some examples of Physical Modeling using various techniques. Using the very versatile...
Melda Production - MSoundFactory.
https://www.kvraudio.com/product/msound ... production
Physical Modeling in MSoundfactory Pt 1, 2, 3 and 4.
by Chandler Guitar, who has started a great series on synthesis techniques.
Modulation of the "Resonator" characteristics is the main challenge...
For the human voice, the main way to modulate the resonator characteristics is changing the shape of the mouth cavity. Just go from vowel to vowel slowly to see/hear/feel it in action.
You will find tables of the relative amplitudes and ratio's of human voice vowles in many sources (not synthesis related only).
As someone already pointed out: Gordon Reid's excellent SoundOnSound Synth Secrets is one of them. https://www.soundonsound.com/techniques ... -synthesis
The thing is that for many natural sounds (being vocals, but also a guitar) there generally is a distinction between the driver/exciter and the resonator. The resonator characteristics are independent of the driver exicter. Something that is not true for something like a oscillator/sample/recording. This is why you get the "Smurf" sound when you pitch up a voice. It is more like the carrier/modulator combo's of Amplitude or Frequency Modulation where you would have the ratio's at a fixed frequency. Or the way a Vocoder works (based on feeding sound into a series of bandpass filters).
Using bandpass filters is one way to achieve the peaks (to emulate a resonator), by filtering out unwanted frequencies. A more "natural" approach is the use of the actual resonator approach. Generally the way to go is "Physical Modeling".
Here are some examples of Physical Modeling using various techniques. Using the very versatile...
Melda Production - MSoundFactory.
https://www.kvraudio.com/product/msound ... production
Physical Modeling in MSoundfactory Pt 1, 2, 3 and 4.
by Chandler Guitar, who has started a great series on synthesis techniques.
Modulation of the "Resonator" characteristics is the main challenge...
For the human voice, the main way to modulate the resonator characteristics is changing the shape of the mouth cavity. Just go from vowel to vowel slowly to see/hear/feel it in action.
Last edited by Kwurqx on Tue Jun 23, 2020 1:52 pm, edited 1 time in total.
-
- KVRian
- 629 posts since 15 Jun, 2017
XOXOS offers many models as free VST's (32-bit only).
https://www.kvraudio.com/developer/xoxos
https://xoxos.net/vst/#Models
Thought for some, the use in music is might be limited, except for ambient/atmospheres/soundscapes/scoring. But many are really interesting to experiment with. Giving insight in some more unusual synthesis concepts.
E.g Fauna (using Waveguide synthesis)
https://www.kvraudio.com/product/fauna-by-xoxos
https://www.kvraudio.com/developer/xoxos
https://xoxos.net/vst/#Models
Thought for some, the use in music is might be limited, except for ambient/atmospheres/soundscapes/scoring. But many are really interesting to experiment with. Giving insight in some more unusual synthesis concepts.
E.g Fauna (using Waveguide synthesis)
https://www.kvraudio.com/product/fauna-by-xoxos
Fauna is a simple waveguide model of the vocal tract intended to provide a flexible platform for the synthesis of abstract and animal voices.
Waveguide synthesis is a physical modeling technique that makes use of delay lines to model the transmission of acoustic vibrations. A sample rate of 44.1kHz translates to a distance of less than 1cm under normal atmospheric conditions, so modern computers can achieve a degree of accuracy in acoustic modeling.
The delay bore is divided into five sections. Reflection coefficients control the transmission between segments, which effectively emulate the area of the tract at each position. Kelly and Lochbaum developed this speech synthesis technique in 1962.
Unlike conventional speech synthesizers, which use the tract to add formants to an oscillator, the fundamental pitch is determined by the overall form of the model.
The glottis acts like a reed, opening with air pressure from the lungs (determined by a pressure coefficient) and closing with air reflected back from the mouth. The position of the glottis is a result of these and other forces, such as tension and springiness of the muscle tissue.
Oscillation is produced by the balance of pressure waves in the tract and their effect on the glottis; as the size of the aperture changes, so do its properties of transmission and reflection.
Fauna VST uses four sets of five reflection coefficients to describe the shape of the vocal tract. Four 16-stage graphic envelopes and three dual-contour LFOs are applied to the tract coefficients. A bank of nine sends and two 'splits' route modulators to parameters as groups to assist in the emulation of organic forms.
The structure is not complex enough for speech, keeping in mind that the five coefficients describe the contour of the throat from the glottis and not from the back of the mouth. It may surprise you how many sounds it can create given the low waveguide count.
-
- KVRian
- 629 posts since 15 Jun, 2017
Or this digital dinosaur
XOXOS - XChanter...
XOXOS - XChanter...
This synger-derived ‘algorithmic vocalist’ uses a reduced phoneme set inspired by polynesian languages.
XChanter uses Lance Putnam’s henon oscillator (orbit modeling) SEP. This is a very old VST, as of 2008 I still hear of it being used.
-
- KVRAF
- 2764 posts since 3 Dec, 2006
Hi Grump, I wish you could teach me about your techniques or share your presets with us as absolutely love the results you get!!GRUMP wrote: Sat Apr 18, 2020 3:35 pm VOX here
Long Story. "Formants" are just Theory. VOX Sounds are way more complicated.
Unison is very important and Wavetables work best = Dune 3 best Synth. Believe me.
Vowel Filter: the Orb. But: Vowel creates just one Kind of Sound.
More Essentials: Overtones vs Noise/Air, Chords, Frequencys + Balance,...
There are many Types of VOX Sounds and they are all created differently. So ...
-
- KVRian
- 1189 posts since 11 Jun, 2019
Have you already made Formant/VOX Sounds this Way and can you show us some Demos?Kwurqx wrote: Sat May 02, 2020 12:30 pm Formants can be described as peaks in a spectrum of a certain hight (amplitude) and width (sort of Q), at a fixed relative distance of each other ("harmonic ratio's") that identify a certain sound (e.g. a specific vowel).
You will find tables of the relative amplitudes and ratio's of human voice vowles in many sources (not synthesis related only).
As someone already pointed out: Gordon Reid's excellent SoundOnSound Synth Secrets is one of them. https://www.soundonsound.com/techniques ... -synthesis
The thing is that for many natural sounds (being vocals, but also a guitar) there generally is a distinction between the driver/exciter and the resonator. The resonator characteristics are independent of the driver exicter. Something that is not true for something like a oscillator/sample/recording. This is why you get the "Smurf" sound when you pitch up a voice. It is more like the carrier/modulator combo's of Amplitude or Frequency Modulation where you would have the ratio's at a fixed frequency. Or the way a Vocoder works (based on feeding sound into a series of bandpass filters).
Using bandpass filters is one way to achieve the peaks (to emulate a resonator), by filtering out unwanted frequencies. A more "natural" approach is the use of the actual resonator approach. Generally the way to go is "Physical Modeling".
Here are some examples of Physical Modeling using various techniques. Using the very versatile...
Melda Production - MSoundFactory.
https://www.kvraudio.com/product/msound ... production
Physical Modeling in MSoundfactory Pt 1, 2, 3 and 4.
by Chandler Guitar, who has started a great series on synthesis techniques.
Modulation of the "Resonator" characteristics is the main challenge...
For the human voice, the main way to modulate the resonator characteristics is changing the shape of the mouth cavity. Just go from vowel to vowel slowly to see/hear/feel it in action.
I didn´t check out PM yet. Mainly because of the Number of Parameters and because I have no Clue what Characteristics it will deliver. And also because I am busy with other Methods. And because I assume that they did not have such Tools back in the Time of VOX (70s/80s). KISS Methodology!
Last edited by GRUMP on Sun May 03, 2020 9:30 pm, edited 2 times in total.
-
- Banned
- 12367 posts since 30 Apr, 2002 from i might peeramid
it's typing time!.
perhaps the concept of formants is more an artifact of the history of sound technology, eg. its a nice concept for speech. but for synthesis and sound in general, approaching from the more generalised idea of spectral envelope might be better. watching appropriate spectroscopes of audio signals would probably help grasp the concept, once stationary peaks in signals with varying pitch are seen.
thought a note about spectral re/synthesis might help -
a fourier transform is sensibly used to perform spectral operations. formant shifting, or spectral shifting is done with cepstral processing.
cepstral processing was developed by mining companies. it is a spectrum of a spectrum, which is why its called cepstrum. they mixed up all teh terms for cepstral processing.. alanysis, quefrency, et c. but think about it like this -
when you take a FFT of an audio you get the spectrum. this returns the magnitude of the frequency bands that you see on a music analyser. you put time in, you get slow time signals (hz) at left, fast time signals at right.
a cepstrum is the spectrum of a spectrum. it is literally performed this way - take your magnitudes, log10 them as you would for a graphic dB display, then FFT that data.. and you have the cepstrum.
now you can see why the cepstrum is used for seismology - the cepstrum shows the slow moving parts of the spectrum in the lower bins, and the fast moving parts of the spectrum in the higher bins. if we separate one side from the other, we have either the "source" part of a signal or the "resonance". if we reverse the process and reconstruct the spectrum from the slow part, we have the spectral envelope. using convolution (multiplication of magnitude in the spectral domain) we can apply that spectral envelope to other signals.
we can also process it to eg. shift the spectral envelope up or down to produce the "formant shift".
and also if we dispense with the slow moving cepstral bins and keep the fast, we can reconstruct the "source", eg. a glottal pulse, or other sound without resonances. ideally.
i put this together for my 32 bit 2.4 freebie resyn2 and was surprised how well it works. that allows you to cross synthesize between two samples but it uses so much data its best if they're real short.
i still muck about just with bandpass filters for such tasks sometimes.
perhaps the concept of formants is more an artifact of the history of sound technology, eg. its a nice concept for speech. but for synthesis and sound in general, approaching from the more generalised idea of spectral envelope might be better. watching appropriate spectroscopes of audio signals would probably help grasp the concept, once stationary peaks in signals with varying pitch are seen.
thought a note about spectral re/synthesis might help -
a fourier transform is sensibly used to perform spectral operations. formant shifting, or spectral shifting is done with cepstral processing.
cepstral processing was developed by mining companies. it is a spectrum of a spectrum, which is why its called cepstrum. they mixed up all teh terms for cepstral processing.. alanysis, quefrency, et c. but think about it like this -
when you take a FFT of an audio you get the spectrum. this returns the magnitude of the frequency bands that you see on a music analyser. you put time in, you get slow time signals (hz) at left, fast time signals at right.
a cepstrum is the spectrum of a spectrum. it is literally performed this way - take your magnitudes, log10 them as you would for a graphic dB display, then FFT that data.. and you have the cepstrum.
now you can see why the cepstrum is used for seismology - the cepstrum shows the slow moving parts of the spectrum in the lower bins, and the fast moving parts of the spectrum in the higher bins. if we separate one side from the other, we have either the "source" part of a signal or the "resonance". if we reverse the process and reconstruct the spectrum from the slow part, we have the spectral envelope. using convolution (multiplication of magnitude in the spectral domain) we can apply that spectral envelope to other signals.
we can also process it to eg. shift the spectral envelope up or down to produce the "formant shift".
and also if we dispense with the slow moving cepstral bins and keep the fast, we can reconstruct the "source", eg. a glottal pulse, or other sound without resonances. ideally.
i put this together for my 32 bit 2.4 freebie resyn2 and was surprised how well it works. that allows you to cross synthesize between two samples but it uses so much data its best if they're real short.
i still muck about just with bandpass filters for such tasks sometimes.
you come and go, you come and go. amitabha neither a follower nor a leader be tagore "where roads are made i lose my way" where there is certainty, consideration is absent.
-
- KVRian
- 629 posts since 15 Jun, 2017
There is so many ways to interpret and implement formants.
As said, it is mostly about conceptually seperating the "driver" (input spectrum) and defining the "resonator" (output spectrum) characteristics. The latter in many cases via a series of (bandpass)filters (sort-of-vocoder style). Or via a finely tuned spectrum of a modal filter (sort-of-a-long-series-of-ultra-narrow-bandpass-filter-just-one-harmonic-wide). And then push sounds through the "resonator".
Generally, the driver (input spectum) will determine "pitch" and the resonator the "shape" of the output spectrum.
A wildly generalized and inacurate comparison to a a subtractive synth would be to replace the filter with an EQ (which could have just a few or many bands and keytracking near or at 0%).
But you could also "simply" create (or find) a wavetable containing all the desired (vowel) spectra and transistions and play with those. You could even transfer these spectra to modal filters and modulate those.
Since you are after choir/vocal sounds, here's another oldie, specifically tailered to choir like sounds. To see if this is a usefull approach. Or at least will give you some new insights. You could then possibly emulate its concepts in an environment in which you have more knowledge and/or control.
Big Tick - Angelina
https://www.kvraudio.com/product/angelina-by-bigtick
It is old and 32-bit only. But maybe 2getheraudio will pick this one up for (64-bit) reissue as they did with his other original classic Big Tick - Cheese Machine.
As said, it is mostly about conceptually seperating the "driver" (input spectrum) and defining the "resonator" (output spectrum) characteristics. The latter in many cases via a series of (bandpass)filters (sort-of-vocoder style). Or via a finely tuned spectrum of a modal filter (sort-of-a-long-series-of-ultra-narrow-bandpass-filter-just-one-harmonic-wide). And then push sounds through the "resonator".
Generally, the driver (input spectum) will determine "pitch" and the resonator the "shape" of the output spectrum.
A wildly generalized and inacurate comparison to a a subtractive synth would be to replace the filter with an EQ (which could have just a few or many bands and keytracking near or at 0%).
But you could also "simply" create (or find) a wavetable containing all the desired (vowel) spectra and transistions and play with those. You could even transfer these spectra to modal filters and modulate those.
Since you are after choir/vocal sounds, here's another oldie, specifically tailered to choir like sounds. To see if this is a usefull approach. Or at least will give you some new insights. You could then possibly emulate its concepts in an environment in which you have more knowledge and/or control.
Big Tick - Angelina
https://www.kvraudio.com/product/angelina-by-bigtick
It is old and 32-bit only. But maybe 2getheraudio will pick this one up for (64-bit) reissue as they did with his other original classic Big Tick - Cheese Machine.
-
- KVRian
- 1189 posts since 11 Jun, 2019
Is Big Tick still alive? Rhino seems quiet interesting...
Angelina seems interesting too, but I have no 32Bit Option.
Btw I discover new VOX Day for Day anyway. Dune, Zebra, Harmor, MSF - Razor, Vertigo, ... - all Kinds of Filters - HQ FX - Samples from Mellotron ober Emulator to Triton and more - there's nothing I could not do If I had more Time.
Just rediscovered Harmor and use it to Feed Dunes WT now - and it sounds quiet Korg
My work has to ne compliant - otherwise I knew what I'd so first and probably for the next 20 Years - and its not"Formant Synthesis" (although I usually have between 5 and 15 Peaks sonewhere).
Angelina seems interesting too, but I have no 32Bit Option.
Btw I discover new VOX Day for Day anyway. Dune, Zebra, Harmor, MSF - Razor, Vertigo, ... - all Kinds of Filters - HQ FX - Samples from Mellotron ober Emulator to Triton and more - there's nothing I could not do If I had more Time.
Just rediscovered Harmor and use it to Feed Dunes WT now - and it sounds quiet Korg
My work has to ne compliant - otherwise I knew what I'd so first and probably for the next 20 Years - and its not"Formant Synthesis" (although I usually have between 5 and 15 Peaks sonewhere).