KVR Audio

Delta Sign · Post by **Delta Sign** » Thu Jul 12, 2018 9:51 pm

Both approaches definitely have their advantages and disadvantages. Our ears and brains evolved to pick up so many tiny inflections of the human voice, it makes it basically impossible to model it in any way. Even if we could model it, there would be so many parameters to control, it would become impossible to use in practice.

Still, stuff like this is still a lot of fun to play with, and if used right, it can certainly be a very useful artistic tool.

Deep learning + additive synthesis would be another really promising approach for emulating the human voice (or anything else, really), I think.

wagtunes · Post by **wagtunes** » Thu Jul 12, 2018 10:15 pm

djanthonyw wrote:
DrMEM wrote:I don't understand why we don't have a proper sample based vocal library yet. There are only 44 phonemes (vocal articulations, essentially) in the English language. I must be missing something, because I figure all you'd have to do is record each of those phonemes across say a 4 octave range and allow for midi cc or keyswitching between them (figuring out a good midi map would probably be the hardest part). That's 2112 samples for 1x rr.

Then maybe you'd want 4 dynamic layers, something like soft, normal, emotional, and gritty, and of course you'd want rr too. That puts you at 33792 samples for 4x rr at each of the four dynamic layers.

What am I missing here?
Try this. It sounds better than Vocaloid to me.

Because it's samples, just like Realitone Ladies which I bought for background vocals.

But it's limited. You can't make a whole song with this with any lyrics you want because you only have the combinations of words that they give you.

quantum7 · Post by **quantum7** » Thu Jul 12, 2018 10:29 pm

Light myself on fire, or listen to an entire album containing vocaloid vocals. Hmm....Decisions decisions.

progtronic · Post by **progtronic** » Thu Jul 12, 2018 11:20 pm

quantum7 wrote:Light myself on fire, or listen to an entire album containing vocaloid vocals. Hmm....Decisions decisions.

Rik · Post by **Rik** » Fri Jul 13, 2018 2:40 am

ENV1 wrote:
Vocaloid 5 Blurb wrote:The Vocaloid singing synthesizer makes it possible to easily produce any kind of singing voice you can imagine using just a computer.
Finally, at last, i can make those Joe Cocker and Janis Joplin tribute albums i always wanted to make!

Vocaloid 5

Can't really figure out how better this is than a massive sample library and Melodyne and sadly the main video on the vocaloid home page is real joke. That is the kind of video that scares you to think that all their $$$ were spent in CGI instead of innovative DSP processing.

wagtunes · Post by **wagtunes** » Fri Jul 13, 2018 2:46 am

Rik wrote:
ENV1 wrote:
Vocaloid 5 Blurb wrote:The Vocaloid singing synthesizer makes it possible to easily produce any kind of singing voice you can imagine using just a computer.
Finally, at last, i can make those Joe Cocker and Janis Joplin tribute albums i always wanted to make!

Vocaloid 5
Can't really figure out how better this is than a massive sample library and Melodyne and sadly the main video on the vocaloid home page is real joke. That is the kind of video that scares you to think that all their $$$ were spent in CGI instead of innovative DSP processing.

A massive sample library and melodyne can't touch this. Having said that, this can't touch real vocals.

So it really depends on what your expectations are. If you're wondering if this is a MAJOR improvement over Vocaloid 4, yes, it is. If you're wondering if this can do Janis Joplin, Tina Turner or Rod Stewart, it can't.

You are never going to see technology that will replace a human singer.

You CAN have technology that is MORE than usable.

Armagibbon · Post by **Armagibbon** » Fri Jul 13, 2018 2:58 am

How do you peeps do this virtual singer stuff? Got a taste with ewql choirs and I just cant do that kinda tweaky work. For me I gotta punch it in quick or the track dies... so can you do it fast w/o it soundin wack?

Rik · Post by **Rik** » Fri Jul 13, 2018 3:04 am

wagtunes wrote:A massive sample library and melodyne can't touch this. Having said that, this can't touch real vocals.

I believe you - it is just scary to see so much marketing going into something like this. However I can't figure out what makes that thing that much better than a massive sample library because all I see in the demos is the selection of phrases (you would have that in a sample library) and then tweaking the pitch and tempo. Melodyne actually sounds way better for the pitch stuff they show in the video.

wagtunes · Post by **wagtunes** » Fri Jul 13, 2018 3:05 am

Armagibbon wrote:How do you peeps do this virtual singer stuff? Got a taste with ewql choirs and I just cant do that kinda tweaky work. For me I gotta punch it in quick or the track dies... so can you do it fast w/o it soundin wack?

Depends on what you call fast. I can put together a complete song from scratch (including the actual writing of the song) with Vocaloid vocals and a full blown backing track for a 3 to 4 minute song in 3 to 4 hours.

Fast enough for you?

wagtunes · Post by **wagtunes** » Fri Jul 13, 2018 3:06 am

Rik wrote:
wagtunes wrote:A massive sample library and melodyne can't touch this. Having said that, this can't touch real vocals.
I believe you - it is just scary to see so much marketing going into something like this. However I can't figure out what makes that thing that much better than a massive sample library because all I see in the demos is the selection of phrases (you would have that in a sample library) and then tweaking the pitch and tempo. Melodyne actually sounds way better for the pitch stuff they show in the video.

The video sucks. These people don't know how to present anything.

Listen to my Vocaloid songs and tell me you can do that with Melodyne.

WinterdrivE · Post by **WinterdrivE** » Fri Jul 13, 2018 3:17 am

DrMEM wrote:I don't understand why we don't have a proper sample based vocal library yet. There are only 44 phonemes (vocal articulations, essentially) in the English language. I must be missing something, because I figure all you'd have to do is record each of those phonemes across say a 4 octave range and allow for midi cc or keyswitching between them (figuring out a good midi map would probably be the hardest part). That's 2112 samples for 1x rr.

Then maybe you'd want 4 dynamic layers, something like soft, normal, emotional, and gritty, and of course you'd want rr too. That puts you at 33792 samples for 4x rr at each of the four dynamic layers.

What am I missing here?

VOCALOID user of about 9 years who's also recorded sample-based vocal libraries for a similar but unrelated program coming through!

The thing about speech is its not just individual sounds. The human brain gleans a lot of information from the very subtle transitions that happen between sounds as well, and a lack of those transitions becomes readily apparent when they're not present.

For example, we can break the word "boat" down into 3 phonemes: [oU][t], where is the b sound, [oU] is the "oh" vowel, and [t] is the t sound. While these three sounds are what we "hear," its not the only thing that's happening. As the lips and tongue move from the to the [oU] and from the [oU] to the [t], that's creating valuable linguistic information that our brains interpret without us ever even realizing it. (This is why we can understand people over the phone or on LQ audio systems or through heavy effects. Even though the audio of the consonants themselves gets distorted or eliminated entirely, the brain can interpret what's going on in between and rely on that information to understand the consonant. eg, the ear may not hear the "t" itself in "boat" but the brain still understands that the speaker's mouth was moving towards a t and so it can fill in the blank.)

Because of that, when recording a sample based singing library, if I want the library to be able to say "boat", I don't need just separate, isolated , [oU] and [t] samples, I need [(silence) b][oU t][t (silence)] in order to capture that subtle but crucial transitional information.

So I can't have just a single recording for every b that appears in the English language. It has to be paired with another phoneme in order to be meaningful. That means the words bat, bot, bought, bit, beat, boat, butt, boot, bert, bet, bait, bite, bout, book, and boil all require different recordings. And that's only for one phoneme + vowels. That also needs recordings for when its paired with consonants such as in words like ebbed, bring, blast, etc.

So recording a vocal library unfortunately doesn't boil down to 44 phonemes multiplied by ever how many pitches. Its more like 44 phonemes multiplied by 44 phonemes multiplied by ever how many pitches. That then has to be turned into an accessible format that's pronounceable and legible to the person recording, which further complicates the process.

This is why vocal libraries intended to create any word and any melody, such as VOCALOID, typically only have 2 or 3 pitches recorded with no round robins. Its simply because of the sheer amount of recording required to capture everything that's linguistically meaningful, much less anything that's musically meaningful.

I hope this may have helped clear up some confusion about sample-based vocal libraries.

DrMEM · Post by **DrMEM** » Fri Jul 13, 2018 3:32 am

It does, thanks. I just finished watching the Realivox Blue video where they go into some of that reasoning, too.

hamayame · Post by **hamayame** » Fri Jul 13, 2018 4:46 am

Before this become more misleading to the TS, let me set some good baseline for this unique singing software.

1. Vocaloid is an alternative singer for a song.
As the title proposed, Vocaloid from the ground up have been made for producer who didn't get any chance to record real human singer for his/her song. This is what the maker of Vocaloid wanted to be (I forgot the interview source but it's in Japanese).

2. Vocaloid can't never ever beat human singing.
As far as you have the best Vocaloid singer programmer and processing software plugins, this beautiful software just at is best to just mimick the real human singer to at least 80% of real singing technique. Please watch this wonderfully crafted song by two of commercially signed Japanese Vocaloid song Producer by KARENT (biggest Vocaloid music label under Sony, if I'm not wrong), Mitchie M and OSTER Project that strive perfection to Vocaloid.
I remember Mitchie M using a LOT processing plugins by Antares that can control voice tract on every Vocaloid song he/she made.

3. Vocaloid is mostly focused to be produced for a song
Well, refer to the first forgotten interview source. It's main reason is to be an alternative singer on a normal song(pop, rock, jazz, etc), which the ability to making it for another type of voice do (like talking or beatboxing or something else except singing) quite hard except you really dedicate yourself on making it (yes, I'm assure you this software can do it if you have the will power to get it through). As for the example, yet from the same producer Mitchie M was made a track with talking inside of it even though it's only 2 sentence on the intro part of the song.

the talking line is : こんにちは。三時のニュースです。(Konnichiwa. San-ji no nyuusu desu.)

4. Vocaloid is at it's best on it's native language, Japanese
Please don't get your hope high if you're making song outside Japanese one, even when you're using a Native English voice library. This because Japanese language is really strong using vowel accent and most of the word in Japanese is a direct sound (like pronoun of こんにちは　＝　KO-N NI TI WA）.
Though I found the V4x english library is more refined now in terms of clarity and comprehend able word per word, here is some song by English Vocaloid song Producer, CircusP or known as VocaCircus on Youtube.

here is some wonderful song by KIRA on V3 english library.

Astralv · Post by **Astralv** » Fri Jul 13, 2018 5:22 am

Will it work with Cakewalk by BandLab? (former Sonar)? I once was considering to buy it and their tech support told me that it will not work with Sonar. Is it freestanding and VST? Is there video of it working inside the DAW? Thank you.

na4a4a · Post by **na4a4a** » Fri Jul 13, 2018 5:28 am

Wow what a mess that post is.

All music producers care about is a tool working for them, not against them. V5 is definitely a step in the right direction.

To clarify a bit, Vocaloid was originally English only actually. So English is actually closer to the "native" language. That being said, saying software that uses phonemes has a native language is really asinine to say the least...
The main cause of English voices being bad is the lack of training. Spoken TTS voices often require voice actors that are specifically trained for the task. Here they are taking random singers off the street and rolling the dice a bit.

Also I love how you post the absolute worst example on English, Miku English is an absolute abomination created only to make weebs buy the same cute voice more.

Western producers don't care about Eastern vocals since they don't sound familiar enough.

Fire your singers folks, Vocaloid 5 is here!