xoxos

xoxos · Post by **xoxos** » Wed Aug 20, 2008 6:59 am

i'll have some audio up in a day or two of my new voice synthesis model, which is a source > phoneme filter model with some automated refinements. now is the time to convince me of features.

it sounds good. i am happy. it's a hurray for vst moment.

so what would you want from a voice synth that wouldn't be in it?

phonemes are selected by keys on a separate channel from pitch, and a third channel can be used for triggering graphically sequenced words with phoneme selection by up/down dragged graphics.

the second option should improve performance options and popular use since phoneme sequencing is still a time consuming task (it holds a great deal of expression). so i guess if you have a particular group of words you'd like to hear together let me know and they could go on a preset.

as it is, cpu is quite good. unless strongly rebuked it's going to be mono, with one osc. it's low enough to duplicate the instance for effects. it should process external signals just fine (i haven't tried that technique in software yet).

MaliceX · Post by **MaliceX** » Wed Aug 20, 2008 7:19 am

Maybe a similar but better system such as the one used in farbrausch synth V2. (Using letter combinations that make up phonetic expressions; the sentence-to-phoneme converter isn't very good so I tried manual typing

)

ouroboros · Post by **ouroboros** » Wed Aug 20, 2008 7:56 am

requested sentence/phoneme/presets:
"xoxos rocks" -std.
and for a touch of humor,
"please don't feed the robot lobster."

as for features, I take it the phoneme timing is something controlled by the host? I am wondering what tempo change does to the sounds is all, but I'm sure all will be revealed soon. very, very cool.

xoxos · Post by **xoxos** » Wed Aug 20, 2008 4:45 pm

i won't be using the farbrausch method for a few reasons -

- text phonemes have basically no control of timing
- se text input looks yucky
- converting the text to phonemes would be intensive in se

so phonemes are either sequenced on the piano roll, or in graphic 'words,' (which are rate adustible).

i think for 'serious programming' the word feature will only be an amenity, and the 'old style' sequencing in the piano roll used with syng 1 will be the preferred technique.

there are oc two timing considerations - being synchronised to the host, and timing relative to other elements (eg. a consonant needs say 20ms before the vowel)

the second aspect isn't critical with host sequencing because it would take a radical host tempo change to affect intelligibility (say 4x or 1/4x the tempo) and the effect is generally of changing the rate of speech. i'll try for a preliminary (way alpha) mp3 today. it only just got to the sequencing stage.

stanlea · Post by **stanlea** » Wed Aug 20, 2008 5:11 pm

What about accents ?

xoxos · Post by **xoxos** » Wed Aug 20, 2008 6:57 pm

there are around 600 internal coefficients which define the accent. employing the same phoneme set, using the resources i'm aware of it would take hours to analyse appropriate recordings.. eg. isolating a vocal sample and extracting 4 bands of formant data with praat, then evaluating amp and bandwidth settings.

the source oscillator is alternated with filtered noise and a mass-spring plosive generator which also have coefficients.. if you prepared all the analysed formant data in text formatted to specification it would take me a few hours to adjust these parameters as well, so.. accents would take lots of time, more with other phoneme sets.

it would do a 'homemade gui' version together with 196 sliders on it for the formant frequencies (4 X 49), which would 'change the accent' within the phoneme set, but you've really got to want it, and i'm not sure you would, as it would take hours to set up and would probably not have good results in my experience

we can pretend that my southern british/arizonan accent is 'culturally neutral' if you like

it works for me..

as it is, there is some ability to do so -

i apply long vowels (a in say, o in note, u in tune, i) as diphthongs, which means they have two formants crossfaded at a rate dictated by the sequence, which does affect the accent. these 'extra diphthongs' might also help in approximating other phonemes

i was sequencing a 'j' yesterday (ha) and a longer setting (which i left in the demo i'll upload soon) sounded french to me, so timing does have some effect.

another example.. say 'o' in 'note' and notice you shift between two formants..

changing the time you shift from the first formant to the second and the rate of transition should go through an array of speaker characteristics.. with my culture and accent, a long transition sounds posh in comparison to a shorter one.

i also remember getting a british vs. american quality out of the timing and duration of an 's' used at the end of a word.. a longer sound seemed more american to me.

there is another possibility - i can add a small number of foreign phonemes to the end of the set if we can isolate necessary amendments, although for source vocal continuity i would have to learn to pronounce it correctly. otherwise the most effective way would be for to prepare voice data from your own analyses, and i'd probably charge for the conversion time. we'll see if the quality is worth the effort.

stanlea · Post by **stanlea** » Wed Aug 20, 2008 8:06 pm

Ok not so easy. Let's hear first experiments. If I understand well, processing phonemes is a long work and cannot be automated, correct ? The easiest way would be a set generator program to customize the synthesis.
But maybe it could be usefl too to add some foreign phonems : asking some people here to provide some sentences well pronounced to show you the way, with tipical sounds of their language, ex "u" in french is very special, nothing similar in english or spanish.

xoxos · Post by **xoxos** » Wed Aug 20, 2008 10:31 pm

let's make sure we've got the whole language covered so i'm not just adding half the necessities

here's my formant list.. (and the mp3 will be up in the next 12 hours).

it's similar to 'standard english' with the exception of treating long vowels as diphthongs (imo a technical necessity).

Code: Select all


vowels:
 1 ah far
 2 a  hat
 3 aw fall
 4 e  bell
 5 e- beef
 6 oe her
 7 i  bit
 8 o  not
 9 oo blue
10 u  sun
11 u- foot
12 schwa the

consonants:
13 b
14 d
15 f
16 g
17 h
18 j
19 k
20 l
21 m
22 n
23 p
24 r
25 s
26 t
27 v
28 w
29 y
30 z
31 ng thing
32 th teeth
33 dh there
34 sh
35 zh measure
36 kh loch
37 ch church

diphthongs and long vowels:
38 a- late
39 a- (part 2)
40 o- note
41 o-
42 u- you
43 u-
44 i- eye
45 i-
46 ow cow
47 ow
48 oi join
49 oi

if you can get me some recordings by friday

i'll need to do a midrange (220Hz) but a wider range and more voices will help me understand it.

and tia for any help.. it seems like half my users are french, so this would be a benefit.

spacedad · Post by **spacedad** » Wed Aug 20, 2008 11:07 pm

i'd like an english north-eastern robot,like a robo-chubby brown!

xoxos · Post by **xoxos** » Thu Aug 21, 2008 6:33 am

i don't know who chubby brown is. i did just discover that i had completely forgotten about it ain't half hot mum though.

here's the first test run of the engine. there are no lfos or envelopes yet.

http://www.breathcube.com/syng2demo01.mp3

birrbits · Post by **birrbits** » Thu Aug 21, 2008 8:13 am

this thread made me take my spoken language processing book of the shelf (mighty dusty). It seems I learned the phonemes using the sun representation, so I googled that and came up with this link, which may aid in your quest.

http://www.ibiblio.org/sounds/phonemes/

audington · Post by **audington** » Thu Aug 21, 2008 8:59 am

404 error on the mp3 demo. Am VERY keen to hear how this works!

stanlea · Post by **stanlea** » Thu Aug 21, 2008 10:26 am

audington wrote:404 error on the mp3 demo. Am VERY keen to hear how this works!

type demo01

audington · Post by **audington** » Thu Aug 21, 2008 10:41 am

Thanks!

And Xoxos, this is pretty cool!

thoshu · Post by **thoshu** » Thu Aug 21, 2008 11:01 am

Unbelieveable - xoxos, you're a genius!

But we did know that before...

xoxos

KVR Audio

voice synthesis features