BLOG
AI and IP Rights: The Copyright Conundrum (Part 1)
By: Sorush Ghodsi (Head of IP Protection)
The rapid rise of new technologies, particularly Artificial Intelligence (AI), has fundamentally altered the digital landscape, challenging many of the traditional rights held by individuals and corporations alike. AI has revolutionized industries, streamlined complex workflows, and sparked fierce debates in courtrooms around the globe. Among the most hotly contested areas is the realm of Intellectual Property (IP) rights. The delicate balance that IP law has maintained for centuries, designed to reward human innovation and creativity, is now facing unprecedented stress tests.
In this ongoing series of blog posts, we will attempt to briefly review the complex intersection of IP rights and AI systems, unpacking what this rapidly evolving technology means for creators, developers, and the general public. We begin our exploration with arguably the most disrupted domain of them all: Copyright.
The Generative AI Revolution
Probably the most challenged right, and the one that has resulted in the most serious concerns across creative industries, is copyright. Day after day, powerful new tools are deployed that can generate full-length books, sophisticated software code, hyper-realistic photographs, and complex musical compositions, feats that previously would have taken teams of artists and authors years of rigorous labor to produce.
Today, in less than a minute, a user can prompt a Generative AI to produce a satirical comedy script featuring historical politicians, or seamlessly blend traditional classical Indian ragas with the gritty style of 1980s American classic rock. Because these Generative AI systems are built upon incredibly massive datasets, they possess the uncanny ability to ingest, replicate, synthesize, and ultimately generate entirely new content on demand. However, this technological magic trick comes with severe legal friction.
The Core of Copyright Law: The Idea-Expression Dichotomy
To understand the conflict, we must first look at the foundational rules. Under the European Union legal framework, specifically the Information Society (InfoSoc) Directive, copyright protection is granted to a work if it constitutes an “author’s own intellectual creation.”
The most important element to notice here is the legal principle known as the Idea-Expression Dichotomy. Copyright law dictates that an idea itself cannot be protected; it is only the author’s unique, tangible expression of that idea that receives legal shielding. For example, the underlying idea of a brilliant detective solving crimes in Victorian London while smoking a pipe is free for anyone to use. However, Sir Arthur Conan Doyle’s specific literary expression of that idea through the beloved character of Sherlock Holmes is what was historically protected.
This protection grants the human author exclusive economic rights, specifically, the rights of reproduction, distribution, and communication to the public. This means nobody can sell or copy their work without permission. Furthermore, the EU framework places a heavy emphasis on non-economic rights, known as moral rights. These include the right to be recognized as the creator (paternity) and the right to object to any derogatory action that distorts the work or harms the artist’s reputation (integrity). AI technology threatens to disrupt both the economic and moral facets of this traditional framework.
Phase One: The Training Data Dilemma (Ingestion)
In order to correctly assess the copyright challenges posed by AI systems, we need to divide the technology’s lifecycle into two distinct phases.
The first phase is the training or ingestion process. To function effectively, AI models require immense volumes of data, often billions of text parameters, images, and audio files scraped directly from the internet. Accessing this data is a legal minefield. While some content is in the public domain or freely licensed for anyone to use, the vast majority of internet content is protected by copyright and is not freely available for commercial exploitation.
To lawfully use the exclusive economic rights of human authors during this training phase, AI providers must secure legal authorization. Theoretically, this authorization could come through direct negotiations and licensing agreements with the rightsholders. However, given the massive scale of data required, individual negotiation is practically impossible.
Therefore, AI companies must rely on regulatory authorizations known as copyright exceptions. Under Article 5 of the InfoSoc Directive, various exceptions were introduced, though EU member states largely had the competence to implement them as they saw fit, leading to a fragmented landscape (except for Article 5(1), which is a mandatory exception).
To modernize this framework, the Digital Single Market (DSM) Directive introduced two highly relevant new exceptions specifically tailored for Text and Data Mining (TDM). These TDM exceptions provide a vital legal pathway for AI training. However, they come with strict conditions, most notably, the right for authors to opt-out or strictly reserve their rights against such data mining, often requiring machine-readable formats (such as metadata) to do so. Therefore, if an AI provider complies with these conditions and respects the legal opt-outs, they have the legal authorization to train their AI systems using protected works. If they fail to do so, they open themselves up to massive infringement liabilities.
Phase Two: The Output Enigma (Generation)
The second phase of the copyright cycle in AI systems occurs at the output, the exact moment the AI generates a response for the user.
Because Generative AI systems have ingested so much varied data, they can do more than just unlawfully copy an author’s exact expression; they can generate an entirely new form of expression based on the underlying ideas they “learned.” This brings us back to the Idea-Expression Dichotomy. If the new AI-generated expression only contains the original author’s underlying idea or mimics their artistic style, it is technically a new creation. In these cases, there is generally no copyright infringement at the output stage.
However, distinguishing between a “new expression” and an infringement is complicated by strict European case law. In the landmark Infopaq case (C-5/08), the Court of Justice of the European Union (CJEU) established a remarkably low threshold for copyright protection, ruling that even a small extract of 11 consecutive words could be considered an author’s own intellectual creation. Similarly, in the Pelham case (C-476/17), the CJEU ruled that sampling even a 2-second continuous audio loop could constitute an infringement of a phonogram producer’s rights, unless the sample is modified in a form unrecognizable to the ear.
These precedents mean that there is a high probability that if an AI’s output is not sufficiently distinct from the source material, it will be considered an infringement. If an AI’s output is an exact or highly similar copy of the author’s original expression, the author has a clear-cut case.
But if the output is simply another expression of the same idea, the author cannot legally contest the output itself. Their only legal recourse might be to claim their work was illegally used during the training phase. Unfortunately for creators, proving exactly what data an AI was trained on is notoriously difficult due to the “black box” nature of these complex algorithms.
To address this imbalance, Article 53 of the EU AI Act introduces crucial transparency obligations. It requires providers of general-purpose AI models to implement a policy to respect EU copyright law and to publish a sufficiently detailed summary of the content used for training. This regulatory requirement is designed to make it much easier for authors and creators to verify if their work was used and enforce their rights accordingly.
Beyond the Law: Social and Ethical Implications
Ultimately, this issue has raised profound concerns that extend far beyond courtroom debates; the implications are highly social and ethical. As AI-generated content becomes increasingly indistinguishable from human-made art, it threatens to replace commercial artists, writers, and musicians, raising severe issues regarding job availability and economic survival in the creative sector.
Furthermore, the existential concern regarding the interruption of human creativity cannot be overlooked. If machines can instantly synthesize culture, what happens to the human drive to create? As we navigate this new technological frontier, finding a sustainable balance between fostering rapid innovation and protecting human creators will undoubtedly be the defining legal challenge of our time.
Stay tuned for the next post in our series, where we will dive into the complexities of Patent Law and Artificial Intelligence.
Bibliography
Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society [2001] OJ L167/10
Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market [2019] OJ L130/92
Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) [2024] OJ L2024/1689
Case C-5/08 Infopaq International A/S v Danske Dagblades Forening [2009] ECR I-6569
Case C-476/17 Pelham GmbH v Ralf Hütter [2019] ECLI:EU:C:2019:624
Apoorva Verma, ‘The Copyright Problem with Emerging Generative AI’ (2023) 7(2) Journal of Intellectual Property Studies 69
Frank Pasquale and Haochen Sun, ‘Consent and Compensation: Resolving Generative AI’s Copyright Crisis’ (2024) 110 Virginia Law Review Online 207
Dennis Crouch, ‘Using Intellectual Property to Regulate Artificial Intelligence’ (2024) 89 Mo L Rev 781
Ayelet Gordon-Tapiero and Yotam Kaplan, ‘Generative AI Training as Unjust Enrichment’ (2025) 86 Ohio St LJ 285
