Unbelievable Scale of AI’s Pirated-Books Problem


log in or register to remove this ad


Article from the Atlantic today detailing just how extensive Meta’s piracy of copyrighted works was to train their “AI” LLM. The article also points to similar court filings against OpenAI. Literal millions of books including RPGs were illegally used to train these programs.
I think to call it AI's pirated book problem is to make it something it isn't. It's certain companies (even if well known) and not an 'AI' problem, IMO.
 


This is going to be an unpopular opinion:

Training AI on existing works isn't piracy in any reasonable definition of the term. It isn't necessarily ethical, but it isn't piracy. By intentionally using an incongruity term and trying to shoehorn it into your argument, you actually weaken your argument.

More simply: if I can't ask.ChatGPT to replicate the PHB, it isn't piracy.
 

Sounds like Library Genesis, or LibGen needs to be shut down since it's hosting the pirated books, like other websites have been shut down or threatened with legal action.
 

They’re stealing from our people. From Ari Marmell ‪@mouseferatu.bsky.social‬:

“There's a tool available now that lets you search the written material that Meta/Zuck stole to train their AI.

Fifteen of my novels. FIFTEEN of them. Plus countless short stories and gaming books.

I won't see a single cent from this multi-billion-dollar corporation's theft of my work. Not one.”

 
Last edited:


Sounds like Library Genesis, or LibGen needs to be shut down since it's hosting the pirated books, like other websites have been shut down or threatened with legal action.

So it's interesting. I've been reading a LOT of medical articles of late for reasons.

Many, you dont get the full release, which it seems is what this LibGen was intended for (or wants to claim), the offering up of information that could advance things that could be quite critical to people's lives.

Novels, Game Books, and the like, are critical for people to get paid for so they can make a living.

I'm pretty sure these things are not equal, but I'm also sure that it doesn't matter to anyone but the people ripping, or getting, ripped off.
 

This is going to be an unpopular opinion:

Training AI on existing works isn't piracy in any reasonable definition of the term. It isn't necessarily ethical, but it isn't piracy. By intentionally using an incongruity term and trying to shoehorn it into your argument, you actually weaken your argument.

More simply: if I can't ask.ChatGPT to replicate the PHB, it isn't piracy.
I'd argue that piracy is simply the first step in the process. They download the books without paying for them. How is that not piracy?
If you can't ask ChatGPT to replicate the PHB, you can't use it to pirate it yourself.
 

Remove ads

Top