Dataset sources of musical generative AI

Anything about MUSIC but doesn't fit into the forums above.
RELATED
PRODUCTS

Post

So ... I know there is an Udio/Suno topic already, but pls allow me to create another one. Thistime focused on one particular aspect. It kinda keeps me awake at night. Let me explain:

Who the h*ll provided data to train this? I mean ... if results of Udio and Suno were "freesound/soundcloud" level qualitty music, i'd be willing to accept that it was trained on CC-0 data. But it sounds good. Amazing at times. Nailing musical styles, vocal styles. I'm unable to believe that CC-0 music can lead to such a precise and "cool sounding" generators. It feels to me, that the chance theese have been trained on copyrighted music is quite significant. If (and I'm very well aware this is pure speculation, so I emphasize the "if" part) this is trained on actual non-CC music, there are pretty much three scenarios this dataset gathering might have gone by:

1) It's illegally scrapped and both companies are trying to hush it. In that case I'm baffled by the lack of buzz around it. Big labels are forcing Spotify to pay them huge chunks of money, they force YouTube to scan every nephew's online gaming fan video for copyright infingement, but somebody scrapes probably their complete catalog and they're like: "Whatever..." I don't know, that does not add up.

2) Some huge sync-library like Audiojungle, Epidemic Sound or Artlist provided the data. They do have the rights for the music as artists are (with some exceptions) usually forced to sign the rights off. In that case I'd be very cautious about planning with any earnings from sync music. That might end any minute now.

3) The one I fear the most: One of the actual major labels silently provided the catalog so they can test the capabilities and public adoption of this tech. If that is true, hyper-personalized music generation is around the corner. It's essentially a perfect upsell of label's old catalogue, isn't it? "Wanna custom Jay-Z song? No problem, pay for our generator. Your Universal." Also it would explain why labels seem to be ice cold about this topic. In that case I'm baffled by the lack of response from the artists themselves, because that's something their (sometimes very expansive) agreements had no way of accounting for. That would mean labels are using contractual grey area to set precedent of complete non-compensated monetization of their catalogs.

I'm kinda angry about this, If you can't tell. I'm dead-scared my whole artistic life to remix anything so label can't come and sue my butt off. ...yet somebody quite probably scrapes someone's entire catalogue to create essentially a mix'n'match remix machine and that's not problematic at all? Come on.

Is there any official stand of Performing Rights Organisations on this? BMI? ASCAP? GEMA? All the national variants across Europe and the world? Also were are the class action sniffing companies, when world actually needs them, lol?

(edits for typos)
Last edited by FarleyCZ on Mon Apr 15, 2024 12:24 pm, edited 12 times in total.
Evovled into noctucat...
http://www.noctucat.com/

Post

Good point. I was thinking the same thing.

Post

Are there already international legal regulations on what is and isn't allowed when training AI? I thought it was still a gray area in the jurisdiction because AI has developed so quickly and the laws have not yet been adapted.
I know that Midjourney has been trained on artists' work without their consent and Midjourney still hasn't been taken down.
It seems to be a difficult subject.

Berkeley Technology Law Journal had an article about the subject in February 2023. Their conclusion is that it's not a copyright infringement.

Post

igorius wrote: Mon Apr 15, 2024 12:35 pm Are there already international legal regulations on what is and isn't allowed when training AI? I thought it was still a gray area in the jurisdiction because AI has developed so quickly and the laws have not yet been adapted.
I know that Midjourney has been trained on artists' work without their consent and Midjourney still hasn't been taken down.
Whelp ... that is the biggest bull**ap on the whole thing. I mean ... if I use some material to build a for-profit service, I'm pretty sure that should be considered a commercial use of said material. And if we can agree at least on that, OpenAI, MidJourney, Udio, Suno, they all owe us a huuuuge pile of recepits at least.
Evovled into noctucat...
http://www.noctucat.com/

Post

This applies to all current publicly available AI generators, not just those producing images.
f5b955b1c3a3b327.jpg
Besides, the carbon footprint is gigantic.
You do not have the required permissions to view the files attached to this post.
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post

I actually know some people around AI chip research and I dare to say the carbon footprint is going to be less and less of a problem within next few years. There's some cool stuff cooking in the labs. ...but that ethical thing just irritates me. I really think companies should be obliged to publically disclose the datasets. Opt-out option should be mandatory. And if everyone opts out, then too bad, your business model was crap from the beginning, wasn't it... But without it it's essentially a massive artistic identity theft.
Evovled into noctucat...
http://www.noctucat.com/

Post

BertKoor wrote: Mon Apr 15, 2024 3:38 pm This applies to all current publicly available AI generators, not just those producing images.
f5b955b1c3a3b327.jpg
Besides, the carbon footprint is gigantic.
what's the carbon footprint of art, using charcoal on paper? :o

Post

Good question...
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post

FarleyCZ wrote: Mon Apr 15, 2024 12:01 pm It feels to me, that the chance theese have been trained on copyrighted music is quite significant.
At least udio was trained on copyrighted music. https://www.udio.com/songs/8XuumNGLGwGDoF18WJRBmh
That's Steven Wilson's voice.
Music, just like tortilla, is no fun without a bit of "cheese". :clown:
soundcloud.com/vertlain

Post

See and that's what pi**es me off. If I take Steven Wilson's track and sample it into my own derivative kind of work, label has full rights to file a lawsuit against me. Even if I don't make a dime with that track. But when somebody trains a machine on the same material, that machine spits out derivative works automatically, and they charge a monthly fee for that machine's service... all is perfectly legal.
Last edited by FarleyCZ on Tue Apr 16, 2024 8:04 am, edited 1 time in total.
Evovled into noctucat...
http://www.noctucat.com/

Post

This is not sampling in the traditional way.

If I mimick the voice of Steven Wilson (or whoever) myself, then that's no breach of any copyright. I do that by listening to him (what law is against that?) and comparing with what I can produce, which is training. The neural network (it's not AI !!) does it essentially the same way.

It only becomes illegal if you state that the original voice is reproduced.

Well, ianal but this is uncharted territory full of unexpected pitfalls.
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post

I know there is a difference. But on the other hand if you mimick their voice, you learned that by intended (and licensed) use of that material. Listening. Machine learning isn't listening per say. They take the file and they encode it (amongst millions of other files) into a set of neural network weights. I would argue that is actually more close to processing of that material than to listening to it.

Why do we have a whole rulebook on how many notes you can mimick from other melodies until it's called coppying, but we have nothing in that vain for machine learning?
Evovled into noctucat...
http://www.noctucat.com/

Post

FarleyCZ wrote: Tue Apr 16, 2024 8:10 am Why do we have a whole rulebook on how many notes you can mimick from other melodies until it's called coppying [...]
Really... do we? Then show it to me. Because I have never heard of that being set in stone.

Afaik it's totally up to a judge to decide whether it's a copy, and only when it comes to court.
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post

I keep hearing something about 8 subsequent notes rule, but it's true that I fail to google that. So probably my information is incorrect. Anyway, just the fact that there are reasons to send a song release to court, yet the same reasons don't apply when it comes to motnhly billed mx'n'match remix machine, it baffles me.

From technical standpoint I fail to see the resulting neural network weights model as anything else than a huge aggregate highly compressed copy of all the songs in the dataset.

There is actually a term called "overfitting". Theoretically if you let that model train long enough (and with large enough number of parameters), it becomes so good it starts to spit the original material 1:1. The only reason Suno and Udio don't do that is because that would serve them no purpose. These models are intentionally slightly underfitted, so they spit out "something resembeling, but not quite the original". Just barely enough to be unreckognized by copyright law. In a way that model can be considered a highly compressed copy with an intentionally crippled output mehod.

Also we can take it way deeper philosophically. What is the purpose of a copyright? It's something to ensure your product is consumed by the means you intend it to be consumed and for the fee you'd like to receive for such consumption. When somebody gains a mean to consume it outside of that, copyright is broken. I would argue that generating a lot of Steven Wilson-esque songs can be seen like a way to not consume Steven Wilson's art under his own terms.

It's like selling counterfit Addidas sneakers. Suno and Udio made sure the logo isn't sawed on those sneakers, but it still has all the parts in all the places you associate with an Addidas sneaker. Intentionally.
Evovled into noctucat...
http://www.noctucat.com/

Post

Ok, maybe there isn't a complete silence:

https://en.wikipedia.org/wiki/Anthropic
On October 18, 2023, Anthropic was sued by Concord, Universal, ABKCO, and other music publishers for, per the complaint, "systematic and widespread infringement of their copyrighted song lyrics." They alleged that the company used copyrighted material without permission in the form of song lyrics. The plaintiffs asked for up to $150,000 for each work infringed upon by Anthropic, citing infringement of copyright laws. In the lawsuit, the plaintiffs support their allegations of copyright violations by citing several examples of Anthropic’s Claude model outputting copied lyrics from songs such as Katy Perry’s “Roar” and Gloria Gaynor’s “I Will Survive.” Additionally, the plaintiffs alleged that even given some prompts that did not directly state a song name, the model responded with modified lyrics based on original work.

On January 16, 2024, Anthropic claimed that the music publishers were not unreasonably harmed and that the examples noted by plaintiffs were merely bugs
Edit:
...also that argument about human brain working in similar manner. Yes. Agreed. It does. But ideally that human brain heard the song after paying royalties for that consumption. Has any of you received a receipt from Udio or Suno? Because if your music is in the dataset (and the chances are not small), your music has been consumed. Yes, by artificial brain of sorts, but it has been consumed.
Evovled into noctucat...
http://www.noctucat.com/

Post Reply

Return to “Everything Else (Music related)”