• Tue. Oct 21st, 2025

    IEAGreen.co.uk

    Helping You Living Greener by Informing You

    Words As Fuel: Authors Take on Salesforce Over Alleged Use of Thousands of Books to Train Its AI

    edna

    ByEdna Martin

    Oct 21, 2025
    words as fuel authors take on salesforce over alleged use of thousands of books to train its ai

    Cloud-software giant Salesforce is suddenly in the eye of a creative storm. Two novelists, Molly Tanzer and Jennifer Gilmore, have filed a sweeping class-action complaint claiming the company used thousands of copyrighted books without consent to teach its in-house language models known as xGen.

    They say their work was pulled wholesale into AI datasets that the firm later used to build and market new products — a move that has sparked both outrage and unease across the literary world.

    The details surfaced through a recent report on Salesforce’s growing legal troubles, which hints that this might be only the beginning.

    The writers argue that Salesforce drew from a massive archive of digitized books — including collections like “Books3” and “The Pile” — to feed its generative models without seeking permission or offering compensation.

    Some tech outlets have pointed out that this isn’t the first time major companies have faced such claims, citing earlier reports of similar lawsuits hitting other AI developers.

    And here’s the kicker: Salesforce’s own CEO, Marc Benioff, once publicly criticised AI companies for “stealing” creative data to train their systems, calling it “unethical and unnecessary.”

    That quote has aged about as well as milk in the sun, given this week’s headlines.

    The case, filed in a San Francisco federal court, could balloon fast if it gains class-action status.

    The authors want the company to disclose every dataset used since 2022 and to compensate any creators whose work was ingested.

    As one legal analyst told me over coffee — “If this goes forward, we might finally see AI firms held accountable for what’s under the hood.”

    That accountability might stretch beyond Salesforce: just months ago, a similar complaint against Anthropic ended in a billion-dollar settlement, as described in a recent industry briefing about the AI copyright fallout.

    There’s a broader story behind the paperwork. Over the past two years, the industry has leaned heavily on open-source datasets scraped from the web — novels, blog posts, Wikipedia entries, the works.

    The assumption was that it all fell under “fair use.” But that idea is wobbling. When an AI learns from your novel, it’s not quoting you — it’s digesting your creative DNA.

    Legal scholars have begun debating whether that counts as transformation or replication, especially after court rulings that sided with certain AI firms on fair-use grounds. The line is blurring faster than we can redraw it.

    Honestly, it’s a strange time to be a writer. I talk to authors who shrug — “Maybe my book’s been in a dataset for years; what can I do?” Others are furious. I get it.

    Your prose shouldn’t become someone else’s profit stream just because it’s digitized.

    Yet, part of me wonders if we’re watching the painful birth of a new kind of licensing economy — one where stories, style and syntax are the oil fields of AI.

    It’s messy, sure, but so was the music industry before royalties found their rhythm.

    If Salesforce loses, this could reshape how AI companies gather and pay for data. If they win, we might be told again that scraping culture is innovation.

    Either way, the verdict will echo through every marketing team and copywriting department that relies on automated text.

    Because behind every “smart” machine that writes like a human, there’s a human who wrote first — and maybe it’s time they got their due.

    Leave a Reply

    Your email address will not be published. Required fields are marked *