KVR Audio

pdxindy · Post by **pdxindy** » Wed Mar 04, 2026 6:38 am

bermudagold wrote: Wed Mar 04, 2026 5:26 am but generates it from what?...

When an AI model generates a video, what is it generating from? Same principle. It is directly generating audio, not controlling a synthesis engine (Samples, additive, subtractive, modal).

bermudagold wrote: Wed Mar 04, 2026 5:26 am why would they be recording an orchestra in budapest for their new string instrument if it doesn't use samples?

It's training data for the AI.

The Ace Studio violin sounds more dynamic than sample libraries. Even with a big sample library and time consuming articulations, the Ace Studio violin sounds more authentic. Also, one reviewer stated that every time audio is generated from the identical midi, the result is a bit different. If so, then every performance is unique, more like an actual musician.

bermudagold · Post by **bermudagold** » Wed Mar 04, 2026 7:11 am

pdxindy wrote: Wed Mar 04, 2026 6:38 am
bermudagold wrote: Wed Mar 04, 2026 5:26 am but generates it from what?...
When an AI model generates a video, what is it generating from? Same principle. It is directly generating audio, not controlling a synthesis engine (Samples, additive, subtractive, modal).

bermudagold wrote: Wed Mar 04, 2026 5:26 am why would they be recording an orchestra in budapest for their new string instrument if it doesn't use samples?
It's training data for the AI.

The Ace Studio violin sounds more dynamic than sample libraries. Even with a big sample library and time consuming articulations, the Ace Studio violin sounds more authentic. Also, one reviewer stated that every time audio is generated from the identical midi, the result is a bit different. If so, then every performance is unique, more like an actual musician.

sounds like marketing speak...it mostly definitely is controlling synthesis engine if u are correct...what do you think synthetically directly generating audio is?...so in additive...additive reconstructs the sound synthetically by summing many individual sine waves of varying frequency, amplitude, and phase envelopes (partials)...The result of this summation is a single composite waveform — the final synthetic sound wave that is sent to the speaker or DAC...tiles above just posited neural vocoder generates a brand-new waveform from statistical inference..both involve analysis followed by generation of a new synthetic composite waveform...how else is that new synthetic waveform instantiated into sound without a synthesis engine?

of course synthesis will be more dynamic than pcm...cause you aren't limited in velocity steps to number of recorded samples...but pcm has always been more faithful to timbre...that has always been the tradeoff between sampling and synthesis...expressivity vs realism...but synthesis more "authentic"?...most don't agree with you...read the comments below and the reviews...i trust my ears and none of the instrument models they have currently sound on the level of timbre realism of mid tier sample libraries...let alone high end...maybe soon?...maybe never?

Tiles · Post by **Tiles** » Wed Mar 04, 2026 7:31 am

No, a neural vocoder is not a synth engine. There are no oscillators, partials, or hand-designed signal chains. The AI doesn’t “control” anything like that. It predicts waveform samples directly from learned statistical patterns. Yes, it produces a waveform, but it’s generated mathematically, not assembled from components like a traditional synth. Timbre realism and expressivity are separate from the generation mechanism.

bermudagold · Post by **bermudagold** » Wed Mar 04, 2026 8:43 am

I got an answer...it is not using the pcm samples...it is generating the sound from synthesis

The system uses a two-stage process to "instantiate" the sound that separates the musical logic from the acoustic physics.

1. The Acoustic Model (The "Conductor" / The "Hallucination")
this model outputs a feature representation, most commonly a mel-spectrogram. This is a compact, frequency-based map of what the audio should look like, a visual representation of frequencies over time...but it is not yet audible sound. At this stage, you have the "idea" of the sound (the pitch, the vowels, the vibrato), but you don't have an actual audio wave yet.

2. The Neural Vocoder Synthesizer (The "Instrument / The "Renderer")
This is where the actual audio generation happens. WaveGlow, or diffusion-based generators act as the bridge between that frequency map and the final time-domain waveform. It has been trained on thousands of hours of real recordings to learn the "phase" and fine-grained acoustic textures that are missing from a spectrogram. It uses this learned knowledge to "guess" the waveform that would produce that specific spectrogram. The Neural Vocoder synthesizer takes that spectrogram and "fills in" the actual samples of a waveform. It predicts the next audio sample based on a learned pattern.

a. Waveform Reconstruction: It "hallucinates" the specific oscillations of the the audio signal that would result in the frequencies described by the spectrogram and generates ("draws") the waveform sample-by-sample based on a neural map.

b. Phase Estimation: It accurately predicts the "phase" of the sound, which is why AI instruments sound so much more "solid" and "present" than older synthetic methods like additive that often sounded thin or "phasery."

Additive is rigid, predictable, and—unless you have thousands of partials—often sounds "synthetic." Additive struggles to discern and reproduce overlap and integration of the deterministic, sinusoidal, and stochastic components of a sound. Because the acoustic model has "seen" the complexity of real-world acoustic air, breath, and resonance in its training data, the neural vocoder synthesizer can synthesize non-linear, stochastic (randomized) textures—like the chaotic scraping of a bow on a string—that additive synthesis struggles to replicate without immense manual effort. An additive snapshot is therefore "perceived" as more "precise" and "sterile". Because the neural vocoder synthesizer is able to know what a sound "looks" like at the microscopic level and generate more complexity sample by sample in the time domain, it is considered more "perceptually natural"

pdxindy · Post by **pdxindy** » Wed Mar 04, 2026 8:47 am

bermudagold wrote: Wed Mar 04, 2026 7:11 am
of course synthesis will be more dynamic than pcm...cause you aren't limited in velocity steps to number of recorded samples...but pcm has always been more faithful to timbre...that has always been the tradeoff between sampling and synthesis...expressivity vs realism...but synthesis more "authentic"?...most don't agree with you...read the comments below and the reviews.

The initial comments below were comparing to a real violin, not other VST's (sampled or otherwise). Of course a real violin is better.

Seems like we have different priorities. I don't particularly care if the timbre is exact. What matters to me is the varied and dynamic character... the performance as it were.

I think violin sample libraries all sound boring. And in order to mitigate that boringness (somewhat), you have to spend tedious time working with articulations. Even then, I like the various Ace Studio examples I've listened to better than any sample library examples.

The Ace Studio violin has the character of a violin (not talking about perfect timbre) better than straight synthesis emulations. It's quite impressive.

pdxindy · Post by **pdxindy** » Wed Mar 04, 2026 8:55 am

This example first plays a piano part which is used as the basis for the generation that follows.

wonshu · Post by **wonshu** » Wed Mar 04, 2026 9:06 am

Yes, these machines do this type of stuff pretty well and producing those recordings will be lost to the orchestras who offered production services.

I haven't heard interesting fresh material yet - only this regurgitation of "historic mainstream" be it classical or song-based. Albeit it the sonic quality of it has increased substantially!!

Also yes: that will be enough for many people, but those people were never my (and very most likely your) audience in the first place, so we haven't lost much (yet...)

Tiles · Post by **Tiles** » Wed Mar 04, 2026 11:28 am

wonshu wrote: Wed Mar 04, 2026 9:06 am Yes, these machines do this type of stuff pretty well and producing those recordings will be lost to the orchestras who offered production services.

I haven't heard interesting fresh material yet - only this regurgitation of "historic mainstream" be it classical or song-based. Albeit it the sonic quality of it has increased substantially!!

Also yes: that will be enough for many people, but those people were never my (and very most likely your) audience in the first place, so we haven't lost much (yet...)

That’s not fundamentally different from what happened with sample libraries, or even midi. Many predicted orchestras would disappear from production work, but the opposite happened. Orchestral recordings are still being made all the time, because even sample libraries have to be recorded by real orchestras in the first place. New tools shift parts of the market, but they don’t automatically replace the source.

wonshu · Post by **wonshu** » Wed Mar 04, 2026 12:09 pm

Tiles wrote: Wed Mar 04, 2026 11:28 am That’s not fundamentally different from what happened with sample libraries, or even midi. Many predicted orchestras would disappear from production work, but the opposite happened. Orchestral recordings are still being made all the time, because even sample libraries have to be recorded by real orchestras in the first place. New tools shift parts of the market, but they don’t automatically replace the source.

I hope, wish and pray that you are right!

pdxindy · Post by **pdxindy** » Wed Mar 04, 2026 3:13 pm

wonshu wrote: Wed Mar 04, 2026 9:06 am I haven't heard interesting fresh material yet - only this regurgitation of "historic mainstream" be it classical or song-based. Albeit it the sonic quality of it has increased substantially!!

Yup... AI can only produce what it's been trained on.

zerocrossing · Post by **zerocrossing** » Wed Mar 04, 2026 4:58 pm

pdxindy wrote: Wed Mar 04, 2026 3:13 pm
wonshu wrote: Wed Mar 04, 2026 9:06 am I haven't heard interesting fresh material yet - only this regurgitation of "historic mainstream" be it classical or song-based. Albeit it the sonic quality of it has increased substantially!!
Yup... AI can only produce what it's been trained on.

I’m not sure that’s as important as it’s thought to be. I was challenged earlier to show a piece of music that was similar to an abstract bit of ambient music, and it was very easy. There’s just nothing new that can’t be traced from roots of earlier works. Even if you point out that your influences weren’t the closest source to your work, it’s very likely that they shared similar influences to yours and came to similar conclusions.

So what are we hearing from our AI colleagues? I definitely agree that there’s something. A certain ham handedness about how they produce “new” music. My best guess is that they lack intent. When I pull from my influences, it’s a sort of love letter to previous artists. It’s almost like I’m saying, “thank you for what you’ve done, it helped me make this, but also, go back and find the sources.” AI can’t love, so it’s just throwing together likely elements, based on what has worked in the past that are constrained by the artists or genre stated in the prompt. I’d say that what’s limiting creativity is still the human element, to some extent. If you’re asking it to make songs based on the music of The Beatles, you are probably going to get Rutles type songs, but without the humor and obvious reverence for the Beatles. I guess what I’m saying is that good human music is an homage to its influences, where AI music is an imitation of the influences that are stipulated in the prompt, and of course the linage of that music as well.

pdxindy · Post by **pdxindy** » Wed Mar 04, 2026 6:53 pm

zerocrossing wrote: Wed Mar 04, 2026 4:58 pm
pdxindy wrote: Wed Mar 04, 2026 3:13 pm
wonshu wrote: Wed Mar 04, 2026 9:06 am I haven't heard interesting fresh material yet - only this regurgitation of "historic mainstream" be it classical or song-based. Albeit it the sonic quality of it has increased substantially!!
Yup... AI can only produce what it's been trained on.
I’m not sure that’s as important as it’s thought to be. I was challenged earlier to show a piece of music that was similar to an abstract bit of ambient music, and it was very easy. There’s just nothing new that can’t be traced from roots of earlier works. Even if you point out that your influences weren’t the closest source to your work, it’s very likely that they shared similar influences to yours and came to similar conclusions.

I agree... also, AI can be directed by a human into places the AI might not go otherwise. We're the deciders if we want to be.

wonshu · Post by **wonshu** » Wed Mar 04, 2026 8:07 pm

I made a little simple experiment and wrote about it... what can I say... judge the results for yourself...

I should have probably just fed it the midi notes of what I was thinking... if that's possible with producerAI - I didn't check...

zerocrossing · Post by **zerocrossing** » Wed Mar 04, 2026 8:51 pm

wonshu wrote: Wed Mar 04, 2026 8:07 pm I made a little simple experiment and wrote about it... what can I say... judge the results for yourself...

I should have probably just fed it the midi notes of what I was thinking... if that's possible with producerAI - I didn't check...

Right, but at that point, why not just use a SWAM cello and have full control of the expressive aspects of the results? This is what I have been complaining about. AI can give you impressive results, in general, but not specifically. Of course it can't, or maybe I should say, it could if you wrote out a prompt description for each note, and at that point... well, there's no point.

But what I am interested in is something that can take an instrument that I can play, a guitar, and then transform it into a cello performance. Now, I'm interested, especially if after it's rendered I could go in and make expression adjustments.

pdxindy · Post by **pdxindy** » Wed Mar 04, 2026 9:49 pm

zerocrossing wrote: Wed Mar 04, 2026 8:51 pm
wonshu wrote: Wed Mar 04, 2026 8:07 pm I made a little simple experiment and wrote about it... what can I say... judge the results for yourself...

I should have probably just fed it the midi notes of what I was thinking... if that's possible with producerAI - I didn't check...
Right, but at that point, why not just use a SWAM cello and have full control of the expressive aspects of the results? This is what I have been complaining about. AI can give you impressive results, in general, but not specifically. Of course it can't, or maybe I should say, it could if you wrote out a prompt description for each note, and at that point... well, there's no point.

One reason not to use a SWAM Cello is because it takes some skill and knowledge of how a cello is supposed to sound.

In Ace Studio, you can feed midi to the AI instrument. I know the Violin has articulations and I think their Cello does too. From what I've heard, the automatic articulations done by the AI are better than what some, or even many of us can do manually.

One of the reasons I don't have the SWAM instruments is because I don't use acoustic instrument emulations that much and there is a learning curve to using them. I don't use AI instruments either, but they are improving fast enough that I am starting to consider it.

SUNO is killer!