Fire your singers folks, Vocaloid 5 is here!
- KVRian
- 642 posts since 22 Jun, 2018
Both approaches definitely have their advantages and disadvantages. Our ears and brains evolved to pick up so many tiny inflections of the human voice, it makes it basically impossible to model it in any way. Even if we could model it, there would be so many parameters to control, it would become impossible to use in practice.
Still, stuff like this is still a lot of fun to play with, and if used right, it can certainly be a very useful artistic tool.
Deep learning + additive synthesis would be another really promising approach for emulating the human voice (or anything else, really), I think.
Still, stuff like this is still a lot of fun to play with, and if used right, it can certainly be a very useful artistic tool.
Deep learning + additive synthesis would be another really promising approach for emulating the human voice (or anything else, really), I think.
- KVRAF
- 21196 posts since 8 Oct, 2014
Because it's samples, just like Realitone Ladies which I bought for background vocals.djanthonyw wrote:Try this. It sounds better than Vocaloid to me.DrMEM wrote:I don't understand why we don't have a proper sample based vocal library yet. There are only 44 phonemes (vocal articulations, essentially) in the English language. I must be missing something, because I figure all you'd have to do is record each of those phonemes across say a 4 octave range and allow for midi cc or keyswitching between them (figuring out a good midi map would probably be the hardest part). That's 2112 samples for 1x rr.
Then maybe you'd want 4 dynamic layers, something like soft, normal, emotional, and gritty, and of course you'd want rr too. That puts you at 33792 samples for 4x rr at each of the four dynamic layers.
What am I missing here?
https://www.youtube.com/watch?v=goDHaTz62Fs&t=1337s
But it's limited. You can't make a whole song with this with any lyrics you want because you only have the combinations of words that they give you.
- KVRian
- 1105 posts since 31 Dec, 2006 from the hills above beautiful Boise, Idaho
Light myself on fire, or listen to an entire album containing vocaloid vocals. Hmm....Decisions decisions.
"It is better to compose than decompose."
www.SeanDockery.com https://www.youtube.com/channel/UC6k45d ... J5eCnhNbfA
www.SeanDockery.com https://www.youtube.com/channel/UC6k45d ... J5eCnhNbfA
- KVRian
- 667 posts since 27 Jul, 2010
quantum7 wrote:Light myself on fire, or listen to an entire album containing vocaloid vocals. Hmm....Decisions decisions.
-
- KVRist
- 172 posts since 15 Mar, 2002 from Ottawa, Canada
Can't really figure out how better this is than a massive sample library and Melodyne and sadly the main video on the vocaloid home page is real joke. That is the kind of video that scares you to think that all their $$$ were spent in CGI instead of innovative DSP processing.ENV1 wrote:Finally, at last, i can make those Joe Cocker and Janis Joplin tribute albums i always wanted to make!Vocaloid 5 Blurb wrote:The Vocaloid singing synthesizer makes it possible to easily produce any kind of singing voice you can imagine using just a computer.
Vocaloid 5
- KVRAF
- 21196 posts since 8 Oct, 2014
A massive sample library and melodyne can't touch this. Having said that, this can't touch real vocals.Rik wrote:Can't really figure out how better this is than a massive sample library and Melodyne and sadly the main video on the vocaloid home page is real joke. That is the kind of video that scares you to think that all their $$$ were spent in CGI instead of innovative DSP processing.ENV1 wrote:Finally, at last, i can make those Joe Cocker and Janis Joplin tribute albums i always wanted to make!Vocaloid 5 Blurb wrote:The Vocaloid singing synthesizer makes it possible to easily produce any kind of singing voice you can imagine using just a computer.
Vocaloid 5
So it really depends on what your expectations are. If you're wondering if this is a MAJOR improvement over Vocaloid 4, yes, it is. If you're wondering if this can do Janis Joplin, Tina Turner or Rod Stewart, it can't.
You are never going to see technology that will replace a human singer.
You CAN have technology that is MORE than usable.
-
- KVRian
- 716 posts since 20 Apr, 2017
How do you peeps do this virtual singer stuff? Got a taste with ewql choirs and I just cant do that kinda tweaky work. For me I gotta punch it in quick or the track dies... so can you do it fast w/o it soundin wack?
-
- KVRist
- 172 posts since 15 Mar, 2002 from Ottawa, Canada
I believe you - it is just scary to see so much marketing going into something like this. However I can't figure out what makes that thing that much better than a massive sample library because all I see in the demos is the selection of phrases (you would have that in a sample library) and then tweaking the pitch and tempo. Melodyne actually sounds way better for the pitch stuff they show in the video.wagtunes wrote:A massive sample library and melodyne can't touch this. Having said that, this can't touch real vocals.
- KVRAF
- 21196 posts since 8 Oct, 2014
Depends on what you call fast. I can put together a complete song from scratch (including the actual writing of the song) with Vocaloid vocals and a full blown backing track for a 3 to 4 minute song in 3 to 4 hours.Armagibbon wrote:How do you peeps do this virtual singer stuff? Got a taste with ewql choirs and I just cant do that kinda tweaky work. For me I gotta punch it in quick or the track dies... so can you do it fast w/o it soundin wack?
Fast enough for you?
- KVRAF
- 21196 posts since 8 Oct, 2014
The video sucks. These people don't know how to present anything.Rik wrote:I believe you - it is just scary to see so much marketing going into something like this. However I can't figure out what makes that thing that much better than a massive sample library because all I see in the demos is the selection of phrases (you would have that in a sample library) and then tweaking the pitch and tempo. Melodyne actually sounds way better for the pitch stuff they show in the video.wagtunes wrote:A massive sample library and melodyne can't touch this. Having said that, this can't touch real vocals.
Listen to my Vocaloid songs and tell me you can do that with Melodyne.
-
- KVRer
- 9 posts since 23 Nov, 2014
VOCALOID user of about 9 years who's also recorded sample-based vocal libraries for a similar but unrelated program coming through!DrMEM wrote:I don't understand why we don't have a proper sample based vocal library yet. There are only 44 phonemes (vocal articulations, essentially) in the English language. I must be missing something, because I figure all you'd have to do is record each of those phonemes across say a 4 octave range and allow for midi cc or keyswitching between them (figuring out a good midi map would probably be the hardest part). That's 2112 samples for 1x rr.
Then maybe you'd want 4 dynamic layers, something like soft, normal, emotional, and gritty, and of course you'd want rr too. That puts you at 33792 samples for 4x rr at each of the four dynamic layers.
What am I missing here?
The thing about speech is its not just individual sounds. The human brain gleans a lot of information from the very subtle transitions that happen between sounds as well, and a lack of those transitions becomes readily apparent when they're not present.
For example, we can break the word "boat" down into 3 phonemes: [oU][t], where is the b sound, [oU] is the "oh" vowel, and [t] is the t sound. While these three sounds are what we "hear," its not the only thing that's happening. As the lips and tongue move from the to the [oU] and from the [oU] to the [t], that's creating valuable linguistic information that our brains interpret without us ever even realizing it. (This is why we can understand people over the phone or on LQ audio systems or through heavy effects. Even though the audio of the consonants themselves gets distorted or eliminated entirely, the brain can interpret what's going on in between and rely on that information to understand the consonant. eg, the ear may not hear the "t" itself in "boat" but the brain still understands that the speaker's mouth was moving towards a t and so it can fill in the blank.)
Because of that, when recording a sample based singing library, if I want the library to be able to say "boat", I don't need just separate, isolated , [oU] and [t] samples, I need [(silence) b][oU t][t (silence)] in order to capture that subtle but crucial transitional information.
So I can't have just a single recording for every b that appears in the English language. It has to be paired with another phoneme in order to be meaningful. That means the words bat, bot, bought, bit, beat, boat, butt, boot, bert, bet, bait, bite, bout, book, and boil all require different recordings. And that's only for one phoneme + vowels. That also needs recordings for when its paired with consonants such as in words like ebbed, bring, blast, etc.
So recording a vocal library unfortunately doesn't boil down to 44 phonemes multiplied by ever how many pitches. Its more like 44 phonemes multiplied by 44 phonemes multiplied by ever how many pitches. That then has to be turned into an accessible format that's pronounceable and legible to the person recording, which further complicates the process.
This is why vocal libraries intended to create any word and any melody, such as VOCALOID, typically only have 2 or 3 pitches recorded with no round robins. Its simply because of the sheer amount of recording required to capture everything that's linguistically meaningful, much less anything that's musically meaningful.
I hope this may have helped clear up some confusion about sample-based vocal libraries.
My YouTube channel: https://www.youtube.com/channel/UC0Mp1w ... HHhixEqmJg
Follow me on Twitter! @Winter_drivE
Follow me on soundcloud! https://soundcloud.com/winterdrive
Follow me on Twitter! @Winter_drivE
Follow me on soundcloud! https://soundcloud.com/winterdrive
-
- KVRer
- 19 posts since 18 Jan, 2017
Before this become more misleading to the TS, let me set some good baseline for this unique singing software.
1. Vocaloid is an alternative singer for a song.
As the title proposed, Vocaloid from the ground up have been made for producer who didn't get any chance to record real human singer for his/her song. This is what the maker of Vocaloid wanted to be (I forgot the interview source but it's in Japanese).
2. Vocaloid can't never ever beat human singing.
As far as you have the best Vocaloid singer programmer and processing software plugins, this beautiful software just at is best to just mimick the real human singer to at least 80% of real singing technique. Please watch this wonderfully crafted song by two of commercially signed Japanese Vocaloid song Producer by KARENT (biggest Vocaloid music label under Sony, if I'm not wrong), Mitchie M and OSTER Project that strive perfection to Vocaloid.
I remember Mitchie M using a LOT processing plugins by Antares that can control voice tract on every Vocaloid song he/she made.
https://www.youtube.com/watch?v=buRFjDSIu4o
3. Vocaloid is mostly focused to be produced for a song
Well, refer to the first forgotten interview source. It's main reason is to be an alternative singer on a normal song(pop, rock, jazz, etc), which the ability to making it for another type of voice do (like talking or beatboxing or something else except singing) quite hard except you really dedicate yourself on making it (yes, I'm assure you this software can do it if you have the will power to get it through). As for the example, yet from the same producer Mitchie M was made a track with talking inside of it even though it's only 2 sentence on the intro part of the song.
the talking line is : こんにちは。三時のニュースです。(Konnichiwa. San-ji no nyuusu desu.)
https://youtu.be/l69v6SVoE9k
4. Vocaloid is at it's best on it's native language, Japanese
Please don't get your hope high if you're making song outside Japanese one, even when you're using a Native English voice library. This because Japanese language is really strong using vowel accent and most of the word in Japanese is a direct sound (like pronoun of こんにちは = KO-N NI TI WA).
Though I found the V4x english library is more refined now in terms of clarity and comprehend able word per word, here is some song by English Vocaloid song Producer, CircusP or known as VocaCircus on Youtube.
https://www.youtube.com/watch?v=J1QF7tknlog
here is some wonderful song by KIRA on V3 english library.
https://www.youtube.com/watch?v=7t5JbAue6eY
1. Vocaloid is an alternative singer for a song.
As the title proposed, Vocaloid from the ground up have been made for producer who didn't get any chance to record real human singer for his/her song. This is what the maker of Vocaloid wanted to be (I forgot the interview source but it's in Japanese).
2. Vocaloid can't never ever beat human singing.
As far as you have the best Vocaloid singer programmer and processing software plugins, this beautiful software just at is best to just mimick the real human singer to at least 80% of real singing technique. Please watch this wonderfully crafted song by two of commercially signed Japanese Vocaloid song Producer by KARENT (biggest Vocaloid music label under Sony, if I'm not wrong), Mitchie M and OSTER Project that strive perfection to Vocaloid.
I remember Mitchie M using a LOT processing plugins by Antares that can control voice tract on every Vocaloid song he/she made.
https://www.youtube.com/watch?v=buRFjDSIu4o
3. Vocaloid is mostly focused to be produced for a song
Well, refer to the first forgotten interview source. It's main reason is to be an alternative singer on a normal song(pop, rock, jazz, etc), which the ability to making it for another type of voice do (like talking or beatboxing or something else except singing) quite hard except you really dedicate yourself on making it (yes, I'm assure you this software can do it if you have the will power to get it through). As for the example, yet from the same producer Mitchie M was made a track with talking inside of it even though it's only 2 sentence on the intro part of the song.
the talking line is : こんにちは。三時のニュースです。(Konnichiwa. San-ji no nyuusu desu.)
https://youtu.be/l69v6SVoE9k
4. Vocaloid is at it's best on it's native language, Japanese
Please don't get your hope high if you're making song outside Japanese one, even when you're using a Native English voice library. This because Japanese language is really strong using vowel accent and most of the word in Japanese is a direct sound (like pronoun of こんにちは = KO-N NI TI WA).
Though I found the V4x english library is more refined now in terms of clarity and comprehend able word per word, here is some song by English Vocaloid song Producer, CircusP or known as VocaCircus on Youtube.
https://www.youtube.com/watch?v=J1QF7tknlog
here is some wonderful song by KIRA on V3 english library.
https://www.youtube.com/watch?v=7t5JbAue6eY
-
- KVRian
- 851 posts since 26 Jan, 2014 from United States of America
Will it work with Cakewalk by BandLab? (former Sonar)? I once was considering to buy it and their tech support told me that it will not work with Sonar. Is it freestanding and VST? Is there video of it working inside the DAW? Thank you.
-
- KVRer
- 2 posts since 13 Jul, 2018
Wow what a mess that post is.
All music producers care about is a tool working for them, not against them. V5 is definitely a step in the right direction.
To clarify a bit, Vocaloid was originally English only actually. So English is actually closer to the "native" language. That being said, saying software that uses phonemes has a native language is really asinine to say the least...
The main cause of English voices being bad is the lack of training. Spoken TTS voices often require voice actors that are specifically trained for the task. Here they are taking random singers off the street and rolling the dice a bit.
Also I love how you post the absolute worst example on English, Miku English is an absolute abomination created only to make weebs buy the same cute voice more.
Western producers don't care about Eastern vocals since they don't sound familiar enough.
All music producers care about is a tool working for them, not against them. V5 is definitely a step in the right direction.
To clarify a bit, Vocaloid was originally English only actually. So English is actually closer to the "native" language. That being said, saying software that uses phonemes has a native language is really asinine to say the least...
The main cause of English voices being bad is the lack of training. Spoken TTS voices often require voice actors that are specifically trained for the task. Here they are taking random singers off the street and rolling the dice a bit.
Also I love how you post the absolute worst example on English, Miku English is an absolute abomination created only to make weebs buy the same cute voice more.
Western producers don't care about Eastern vocals since they don't sound familiar enough.