Realistic voice transformation?

VST, AU, AAX, CLAP, etc. Plugin Virtual Effects Discussion
RELATED
PRODUCTS

Post

What's the current state of the art for realistic voice transformation? That is, processing of vocal takes to make them sound like they were sung by a completely different person--different gender, different age, different larynx characteristics, and so on. A simple formant filter/pitch shifter type of solution isn't enough, as I'd like to be able to do more than just gender transformations. (Note: I don't need this to be in real-time, and sonic quality and authenticity are much more important to me than CPU hit, ease of use, or pretty much anything else.)

A bit of research suggests that Ircam Trax v3 (https://fluxhome.com/project/ircam-trax-v3/) is likely to still be the best (only?) thing out there for this, but I'm not sure how well it actually holds up in practice, or whether something better has since come along. It's also rather pricey and if there's a cheaper alternative that can get at least in the ballpark of the range and quality of results that Ircam gets, I'd like to know about that.

Basically, if you know about any plugins that go beyond bare-bones formant and pitch shifting, just throw them out there and I'll do the rest of the necessary research and sort it all out. :) Also, this seems like a long shot, but if there are clever production techniques that involve manual processing of audio and/or repurposing of other kinds of effects to achieve some of what I'm looking for here (beyond the obvious pitch and formant stuff), feel free to share those as well. Thanks everyone!

Post

Have you seen this? http://www.antarestech.com/products/det ... OAT_Evo_14 It's a few years old now but just like Ircam they do have a demo you can try out.

Post

Shit, why do you want a voice changer for?

Post

I don't think there's anything new (or all that effective). For years I've periodically looked fairly thoroughly for something like this, but have always come up empty. I don't think that vocal-alternation technology has advanced sufficiently far yet.

That being said, if I've missed something, I'd be happy to be proven wrong!

Post

Also, this seems like a long shot, but if there are clever production techniques that involve manual processing of audio and/or repurposing of other kinds of effects to achieve some of what I'm looking for here
In Graillon we've added pitch-tracking modulation as a way to make it seem like something else is talking (most settings aren't realistic though). The idea is that when there is a pitch, a synchronized sine is generated which serves as ring mod or frequency shifting carrier. When added to the original signal this selectively hides or emphasizes partials in the voice spectrum, hence changing speaker personality. This works best for male voice and power-of-two ratios of the detected pitch.
Checkout our VST3/VST2/AU/AAX/LV2:
Inner Pitch | Lens | Couture | Panagement | Graillon

Post

SirkusPi, I suspect you're correct, unfortunately. I hadn't heard about Graillon, though. The "changing speaker personality" thing sounds interesting--is there a link to a video of that in action? For that matter, are there even any good videos of Ircam Trax in action? I haven't been able to find anything that properly showcases it. (I know there's a trial version, but I like to see demos before installing anything whenever possible.)

Post

is there a link to a video of that in action?
I'm sorry, PTM is not well showcased, my mistake. Maybe https://soundcloud.com/pigmyonstudio/graillon-2-vsttest but I'm not sure PTM was used in this one. And it's not what you were searching if you're into more realistic, less real-time processors.
Checkout our VST3/VST2/AU/AAX/LV2:
Inner Pitch | Lens | Couture | Panagement | Graillon

Post

Ah, I see. Thanks anyway, it's a cool effect :)

Post

The voice is notoriously difficult to process transparently (and on the flipside, almost impossible to process beyond recognition). It's arguably the single sound category we're all most intimately familiar with, and even the slightest inkling that something isn't right throws you into uncanny territory.

I think it also depends on whether we're talking about a singing or speaking voice here too. The speaking voice perhaps shows up the limits of processing more, as I'm not aware of anything that will change things like prosody. Singing is a bit less 'individual' perhaps as each singer is largely trying to make their voice fit the same template when attempting the same tune, so maybe a little easier to achieve a different identity there?

The very brief time I spent with Flux (one or two sessions as a demo - it's way out of my price range) showed a few interesting options to disrupt speaker identity that I hadn't come across before in the real-time world. The ability to flatten or exaggerate pitch contour was probably the most interesting tool when it came to changing identity with the speaking voice, but everything was still recognisably "me". Even when I went beyond 'natural' processing into the obviously artificial range, I could never quite get beyond "a variation on the same speaker" and into "different speaker" territory. If you're contrasting the 'new' speaker against your natural voice, that's obviously gonna break the illusion.

You might like to give Praat a look. It's been over 10 years since I scratched its surface at university, but I remember there were some exceptionally powerful and detailed analysis/synthesis tools in there. I recall it being very much 'manually' operated and and oriented toward phonetics research though, so perhaps not really suited when you're looking to process a lot of material (e.g. more than a sentence) or use in a 'musical' way. My memory of it is very hazy... Still, approaching the problem from a different angle could be refreshing! http://www.fon.hum.uva.nl/praat/

Well, you probably won't want to give Praat a look to be honest, but nothing wrong with a little leftfield suggestion every now and then :P.

Post

Adobe has new technology in Premier Pro which, given a large enough sample of spoke language, can replace your words and make you say things you never did.

I think we’re on the cusp of a lot of new advancements in that area. With machine learning, it will likely be possible to take some reference audio of your favorite celebrity speaking, and then convolve your own words to sound with their voice. Beyond that, we’ll undoubtedly get into singing.
Incomplete list of my gear: 1/8" audio input jack.

Post

Absolutely not realistic but...
I just did this live-on-video-experiment: transforming female speech (in German, from an interview I did with my mother which I used for an on site sound installation in her current exhibition) first with Melda's MTransform and then with GRM Warp.

https://www.youtube.com/watch?v=QdNXkz5v36s

Post

Some great advice, insights and suggestions here. Thanks all, keep it coming! :)

Post

Ircam trax works great for male to female transform but still waiting for a good sale.

Post

Hmm... does it? The demo videos I watched were actually kind of underwhelming, but frustratingly I can't seem to find any that really do it justice. Would love to hear/see some better demos if anyone knows of any. I'd really rather not go through the trouble of installing it and learning to use it properly given that it seems somewhat unlikely to satisfy.

Post

househoppin09,
My partner, DJ Dan, used it to convert a narrated male voice into that of a female voice. The result was kind of monotone but it wasn't supposed to be singing. Nothing sounded glitchy or antares-like. I'm not sure how it would do with a singing voice but it was impressive enough in our application. The file structure is a bit silly for a vst however. I recall it had quite a few parameters.
I have voice modeler for the tc electronic powercore and that sounds very artificial. Yet most things sound awesome on the powercore.
Voice modeler is ok to use on a double or background only.

Post Reply

Return to “Effects”