AI Model Copyright Ownership: Who Owns the Omelette?
The frontier AI copyright debate has skipped the most fundamental question. Everyone is arguing about whether training is fair use, whether a licensing layer can be retrofitted, and whether the EU’s text-and-data-mining opt-out is workable. Nobody is asking the question that comes before any of that: who owns the trained model?
I have written a long-form academic piece on this for submission to a legal journal. What follows is a version written for the people who actually have to live with the answer, basically everyone.
Everybody’s Omelette
Imagine everybody in the world owns an egg. Every literate person has, over a lifetime, produced one egg’s worth of expressive or inventive work, fixed in some form, and available on the internet. Their writing, their photographs, their drawings, their code, their music, their videos.
A cook arrives. Takes every egg. Doesn’t ask, doesn’t pay. Brings the eggs to an enormous kitchen, adds their own egg (the training code, the fine-tuning datasets, the reinforcement-learning materials, the evaluation suites), and scrambles the lot into the largest omelette ever made. The cook then announces that they are the sole owner of the omelette and charges the egg-owners to eat it. A normal person would be right to ask the cook about how they got the eggs that made the omelette:
“Did you buy the eggs, or did you steal them?”
This is, structurally, how frontier AI models were made, and how they currently sit in the economy. The technical vocabulary (vectors, parameters, weights, corpora, embeddings) holds ordinary property-law intuition at a distance, but the fact pattern is one the law has handled for centuries: confusion of property.
NB: What was ingested into the models via training is not abstract. It is all the content of the internet, accessible by anyone with a browser.
The Two Unstated Premises
Copyright attaches to a literary work at the moment of creation. Whoever creates it, owns it automatically on creation. If someone takes that work and does something with it (copies it, or transforms it – like remixing a song), it does not extinguish the original creator’s ownership of what they made, and what has then been used by someone else.
This means AI developers are currently operating models built from everybody’s work, trained on virtually all content available on the internet, in a way the law simply does not support. They are charging the owners of the models to access them without acknowledging joint ownership caused by mixing together all of the “eggs” that the owners still own. The original content is still available on the internet.
The developer-side sole-ownership position quietly carries two premises that have not been acknowledged, neither of which is stated explicitly because neither is defensible explicitly.
- Premise X: Ingestion extinguishes the contributing author’s interest beyond the original fixation (this would also extinguish the developer’s contribution by the same logic, leaving the resulting models orphaned, owned by no one who contributed to them).
- Premise Y: The extinguished interest re-allocates ownership in the resulting work to the developer rather than to the state, the public domain, or no one.
Premise X is not in the Copyright Act 1968 (Cth), neither in the Berne Convention nor in any decided case. Running a literary work through a GPU does not destroy ownership any more than passing it through a scanner.
Premise Y is unsupported on its own terms. A party claiming a proprietary interest must identify the principle on which the interest operates. The developer-side has identified none, because the principle the law actually supplies for inseparable mixtures (the confusion doctrine) operates in the opposite direction.
Why Infringement Cannot Reach the Asset
Copyright was designed for something like a photocopier. An original goes in, duplicates come out, and the remedies (injunctions, damages, accounts of profits, delivery up) operate on discrete copies that can be counted, impounded, and destroyed.
A frontier model is not a reproduction machine. It metabolises its inputs. Works fed in exit as weights and parameters, incorporated into every subsequent output and recognisable in almost none of them. There is no copy. The remedial architecture of the Copyright Act cannot be engaged at all. This is a real gap in the doctrine of copyright law, and it is why the ownership question is the first question, not a fallback when infringement remedies fail. Ownership is upstream of infringement.
We need to remember that copyright as a form of ownership was a development of the property rights that preceded it. The law needed to develop a doctrine to address how ownership of the original extended to ownership of the copies. Now the law needs to extend again to cover a situation where the original works are not copied, but instead mixed with others, transformed into something constituted by, but distinct from, the originals it was made from. If someone infringes your copyright you assert your ownership by:
- Seeking an injunction: This can’t be done with the models because they have already been made. A cease an desist means nothing here. Cease and desist what?;
- Demanding delivery up of the copies: This can’t be done because the model is not a copy, nor can it be decomposed back into its constitutive parts.
- Accounting for profits: The AI developers who made the models operate at staggering losses. They have been funded by venture capital, which, if we are to assess the fundamentals, does not appear to have a realistic path to profitability anytime soon. Control of the models themselves appears to be the motivation, not profit per se. There is no profit to be distributed, nor any discernible timeline as to when profits might result.
In short, copyright does not give owners whose work was used without consent or compensation a path to being made whole again, with regard to what was actually built from their works.
The Australian Government’s Implicit Concession
The Australian Government’s Copyright and Artificial Intelligence Reference Group (CAIRG), established in December 2023, is approaching this as a licensing problem. That posture is itself an admission. A licence presupposes the licensor owns the thing being licensed and the licensee does not. If the developers own the model, the licence fee attaches to nothing the developer does not already assert title over. If the licence is meaningful, it concedes the developers do not own the constitutive substance, which arrives back at collective ownership. The Government is trying to start a car using the wrong key. It won’t fit and it won’t work.
The Three-Part Settlement
The conservative answer, the one the existing law already produces, is a Berne Article 20 special agreement establishing a jointly-owned institutional form on the NATO-contractor model:
- Ownership vests collectively in the contributing authors and the developers, in proportion to contribution. This means literally everyone owns the model.
- Owner access is free, being the use right that follows from co-ownership.
- Upkeep is carried by member states, with developers participating as contractors rather than sole proprietors.
The institutional form is not novel. CERN, ESO, and NATO infrastructure have used common-funded contractor models for three-quarters of a century. The novelty is the subject matter.
What This Means for Decision-Makers Now
If you run a business that depends on frontier AI (and most established businesses now do), two implications follow:
- Your licensing terms are built on shaky ground. If you are paying an AI API provider on the assumption they own what they are licensing, you should understand the ownership assumption on which they have based their right to charge you. This boils down to little more than possession of the model, and the ability to gate access to it.
- The settlement, when it comes, will need to be multilateral. Plan for the Article 20 reckoning rather than the domestic patch. The Convention cannot deliver national treatment with signatories holding incompatible positions on the same training event.
The radical option is sole developer ownership at the cost of every living author’s Berne-protected rights, on two premises the law does not supply.
The conservative option is collective ownership, which the confusion doctrine already produces.
Those are the two destinations. There is no third option.
The full academic article (approximately 8,700 words) is available on request.
