Yes, and, that shows up more, or, more correctly, is more obvious, in specific contexts. For the uninitiated, you can do these experiments yourself without cost. The site will give you 1min of the output from the current best model.Gamma-UT wrote: Fri Jan 23, 2026 2:10 pm On the “is it AI?” I can see how someone might think it was hand-done with samples but the zippery/phase vocodey artifacts on the trills stand out like a sore thumb.
Anyway, does it always matter? Well, I recall the same kind of discussion about MP3 artifacts. I have also seen this forum completely take the other side of this argument with tools that they want to use, e.g., older synths with very weak filters. I recall criticizing the artifacts in GRM tools and, oh my, you would have thought that I had bad-mouthed the pope.
I still can recall stopping dancing mid track because of bad filter artifacts/distortion in the late 90s and watching while everyone kept dancing because they didn't notice. In this story, this forum is me, everyone else is, well, everyone else. It's the same reason that the general public doesn't care much about auto-tune, it's the modern equivalent of "It has a beat, and I can dance to it."
The artifacts are less obvious in more restrained vocal styles, if, you can also keep the other elements out of the problem areas. Of course, none of this matters much for the vast majority of people using the tools to just have fun.
For those that don't know, if you have a decent video card with at least 16GB of VRAM. Heartmula is a (free to you) project that lets you generate music on your own machine with reasonable results. It is not up the same level as the commercial offerings.
https://heartmula.github.io/
To my ears, this sounds like some of the older Suno models. Some people think that they are gaming their outputs for the benchmarks, I frankly don't care about benchmarks all that much.
What is interesting to me about this is how good the results are with respect to how large, really small, the model is. Compare this to LLMs and it's rather dramatic. Fine tuning this model, on any source, can be done privately with reasonable (but not typically consumer level) hardware. For more specific workflows, e.g., very narrow generative tasks, this is at the level of a completely personal project in 2026.
In about five years time, you will all be arguing over which AI plugin does this one narrow specific thing that matters to you the best. Vocal trills may still have artifacts in commercial models.
