KVR Audio

BABY Audio · Post by **BABY Audio** » Wed Jun 26, 2024 1:18 am

Hi Everyone,
We're excited to announce the release of Humanoid: https://babyaud.io/humanoid

Humanoid is an over-the-top pitch corrector designed for instant hard-tuning and radical voice manipulation. With a brand new approach to vocal tuning and phase-vocoding, the plugin transforms singers of any style and ability into their best synthetic selves.

There's much more info on the product page, so we'll just leave you with an image and the official tutorial video here. We're excited to hear what you think and would love your feedback!

Nug Wrangler · Post by **Nug Wrangler** » Wed Jun 26, 2024 7:08 am

martiu · Post by **martiu** » Wed Jun 26, 2024 9:25 am

omg omg, so nice

billinder33 · Post by **billinder33** » Wed Jun 26, 2024 11:27 am

This looks badazz

Ou_Tis · Post by **Ou_Tis** » Wed Jun 26, 2024 2:47 pm

I like that it apparently lets you use automation for pitch instead of hard tuning, assuming pitch is allowed to be continuous---I wish it had some interesting pitch curve automation effects based on virtuosic or emotional singing, and the ability to upload a vocal sample (that you have the rights to use) and extract the pitch curve.

Ou_Tis · Post by **Ou_Tis** » Wed Jun 26, 2024 2:57 pm

OTOH the idea that "hard tuning = futuristic" is mostly based on the old assumption that robot voices would be monotone and that the emotion and nuance of human inflection would be very difficult for AI to emulate. As it turns out, it's relatively easy for AI---and now actual talking robots who easily pass the Turing test---to emulate. So the idea of hard tuning seeming "futuristic" has just been thoroughly undermined. It's more "retro-futuristic".

eerie_audio · Post by **eerie_audio** » Wed Jun 26, 2024 3:06 pm

Ou_Tis wrote: Wed Jun 26, 2024 2:57 pm that the emotion and nuance of human inflection would be very difficult for AI to emulate. As it turns out, it's relatively easy for AI

A.I. can only mimic or copy something that already exists, it just spits out what you feed it. If your statement was true, all the music billboards would be filled with A.I. artists.

Ou_Tis · Post by **Ou_Tis** » Wed Jun 26, 2024 4:57 pm

eerie_audio wrote: Wed Jun 26, 2024 3:06 pm
Ou_Tis wrote: Wed Jun 26, 2024 2:57 pm that the emotion and nuance of human inflection would be very difficult for AI to emulate. As it turns out, it's relatively easy for AI
A.I. can only mimic or copy something that already exists, it just spits out what you feed it. If your statement was true, all the music billboards would be filled with A.I. artists.

This is getting off topic, so I'll limit my response in this thread to this post. Artificial neural networks (ANN) can approximate any utility function, but in practice the major current models are limited to what can be learned from their training data (which generally includes human feedback). Multiple studies have found that the best ones now outperform humans in correctly inferring human emotions and providing text-based therapy. For emulating human speech in audio conversation, voice models easily pass the Turing test. For sung vocals, Udio does a shockingly good job of emulating human emotion and matching it convincingly enough with the lyrics, though there are occasional artifacts and the timing can be wonky in places. Some examples:

"Endless idiots on the internet"
https://www.udio.com/songs/h59MHh3cFVeTVevRVz1TEW

"My champagne turned"
https://www.udio.com/songs/15u3RP2nUr6rcChzqxvoRU

"Carolina-O"
https://www.udio.com/songs/coixNX1gnJ1oWT8z2LQddk

Granted, they don't "understand" what words mean in the same we way do, they just rely on statistical patterns relating words and vocals. (Multimodal AI that includes more types of data---about moving through the world, for example, or human neural imaging data---may perform better.) So a human performer who's good at extremely nuanced interpretations---or a singer-songwriter who's good at capturing the subtle nuances of words, especially in novel combinations or contexts---can still outperform AI. (And singer-songwriters who are good at expressing nuances will still have an advantage over AI at expressing what they intend to express for the foreseeable future---at least until AI starts using biometrics like neural imaging.) But in most situations they perform well enough that most people can't distinguish them from actual humans. Not even from great human singers.

As for chart popularity---that tends to depend heavily on far more than just the music itself---perceptions of "the artist", marketing, etc.

eerie_audio · Post by **eerie_audio** » Wed Jun 26, 2024 7:18 pm

Ou_Tis wrote: Wed Jun 26, 2024 4:57 pm
eerie_audio wrote: Wed Jun 26, 2024 3:06 pm
Ou_Tis wrote: Wed Jun 26, 2024 2:57 pm that the emotion and nuance of human inflection would be very difficult for AI to emulate. As it turns out, it's relatively easy for AI
A.I. can only mimic or copy something that already exists, it just spits out what you feed it. If your statement was true, all the music billboards would be filled with A.I. artists.
This is getting off topic, so I'll limit my response in this thread to this post. Artificial neural networks (ANN) can approximate any utility function, but in practice the major current models are limited to what can be learned from their training data (which generally includes human feedback). Multiple studies have found that the best ones now outperform humans in correctly inferring human emotions and providing text-based therapy. For emulating human speech in audio conversation, voice models easily pass the Turing test. For sung vocals, Udio does a shockingly good job of emulating human emotion and matching it convincingly enough with the lyrics, though there are occasional artifacts and the timing can be wonky in places. Some examples:

"Endless idiots on the internet"
https://www.udio.com/songs/h59MHh3cFVeTVevRVz1TEW

"My champagne turned"
https://www.udio.com/songs/15u3RP2nUr6rcChzqxvoRU

"Carolina-O"
https://www.udio.com/songs/coixNX1gnJ1oWT8z2LQddk

Granted, they don't "understand" what words mean in the same we way do, they just rely on statistical patterns relating words and vocals. (Multimodal AI that includes more types of data---about moving through the world, for example, or human neural imaging data---may perform better.) So a human performer who's good at extremely nuanced interpretations---or a singer-songwriter who's good at capturing the subtle nuances of words, especially in novel combinations or contexts---can still outperform AI. (And singer-songwriters who are good at expressing nuances will still have an advantage over AI at expressing what they intend to express for the foreseeable future---at least until AI starts using biometrics like neural imaging.) But in most situations they perform well enough that most people can't distinguish them from actual humans. Not even from great human singers.

As for chart popularity---that tends to depend heavily on far more than just the music itself---perceptions of "the artist", marketing, etc.

Good luck with that.

You're probably also in the camp where you believe a visualizer can replace your ears and feel with what you see on your screen. It's probably a field where you can convince a lot of people to buy code based on 1's and 0's, but when it comes down to it, it's the emotion you can never artificially replicate.

You get out what you put in, it will always be a copy and never manifested out of real emotion.

Dirtgrain · Post by **Dirtgrain** » Wed Jun 26, 2024 8:39 pm

I think top-of-the-charts music left behind the concept of "real emotion" years ago, to some degree, with exceptions--rather, it is quite formulaic and contrived, often even when it evokes emotion in the audience. If you look into it, you might be impressed/scared by what AI can already do regarding predicting what will become hits. I'm not in favor of AI driving our culture (its culture?) going forward. I hope we make laws restricting its role. As a tool to help artists get to their vision, it's okay, but regulating that and figuring out where to draw the line is a tough one.

Dirtgrain · Post by **Dirtgrain** » Wed Jun 26, 2024 9:03 pm

Ou_Tis wrote: Wed Jun 26, 2024 2:47 pm I like that it apparently lets you use automation for pitch instead of hard tuning, assuming pitch is allowed to be continuous---I wish it had some interesting pitch curve automation effects based on virtuosic or emotional singing, and the ability to upload a vocal sample (that you have the rights to use) and extract the pitch curve.

Great idea. I wonder also if AI can somehow extract the growl/rasp that different singers work into their voices and apply that to other voices/instruments.

astrofreq · Post by **astrofreq** » Sat Jun 29, 2024 3:38 pm

Ou_Tis wrote: Wed Jun 26, 2024 4:57 pm If your statement was true, all the music billboards would be filled with A.I. artists.

I'm pretty sure that's coming unfortunately. Doesn't Spotify create AI songs under fake artist names and put them on their own playlists?

Ou_Tis · Post by **Ou_Tis** » Sat Jun 29, 2024 3:51 pm

astrofreq wrote: Sat Jun 29, 2024 3:38 pm
Ou_Tis wrote: Wed Jun 26, 2024 4:57 pm If your statement was true, all the music billboards would be filled with A.I. artists.
I'm pretty sure that's coming unfortunately. Doesn't Spotify create AI songs under fake artist names and put them on their own playlists?

Just to be clear, I didn't write that. It's been alleged that Spotify does that, but afaik no individual song has made the Billboard charts for most listens. But that's off topic.

Getting back on topic a bit---the modern patron saint of hard autotune in pop (Charlie XCX) isn't doing all that well on the charts, with her new album's top song down at #82. Maybe she needs Humanoid to spice up her stale monotone boring voice... would be hard for it to hurt. Though tbh I did like her hard autotuned voice on that long ago hit of hers with Igloo Australia ("first things first, I'm the realest ... hold you down (under) like physics").

Ah_Dziz · Post by **Ah_Dziz** » Sat Jun 29, 2024 3:57 pm

What's the processing latency? Or if it's variable what's the maximum processing latency?

astrofreq · Post by **astrofreq** » Sat Jun 29, 2024 3:57 pm

My apologies for the improper quote. Thanks for the clarification.

Humanoid - Baby Audio - Vocal Transformer / Hard Tuner