Licenses are Not Proxies for Openness in AI Models

Earlier this year, the National Telecommunications and Information Administration (NTIA) requested comment on a number of questions related to what it is calling “open foundational models.” This represents the US Government starting to think about what “open” means in the context of AI and machine learning.

The definition of open in the context of AI and machine learning is more complicated than it is in software, and I assume that many people are going to submit many interesting comments as part of the docket.

I also submitted a short comment. It focused on a comparatively narrow issue: whether or not it makes sense to use licenses as an easy way to test for openness in the context of AI models. I argued that it does not, at least not right now.

There are many situations where licenses are used as proxies for “open”. A funder might require all software to be released under an OSI-approved open source software license, or that a journal article be released under a Creative Commons license. In these cases, using the license is essentially an easy way to confirm that the thing being released really is open.

At a basic level, these systems work because of two things: 1) the thing being licensed is relatively discrete, and 2) the licenses used are mature and widely adopted within the community.

Open source hardware acts as a helpful contrast to these other examples. Unlike a software repo or journal article, what constitutes “hardware” can be complex - it might include the hardware itself, digital design files, documentation/instructions, and software. All of these may be packaged differently in different places.

Each of these elements may also have a different relationship with intellectual property protections, especially copyright. We have some mature open hardware licenses, but they are relatively recent and even they embody questions and gray areas related to what they do and do not control when it comes to any specific piece of hardware.

My comment suggests that open ML models are much more like open hardware than open software. The community does not really have a consensus definition of what “open” even means in the context of AI models (freely available weights? code? training data?), let alone how (and if) those elements might need to be licensed.

In light of this, it would be unwise to build a definition of open foundational models that would allow “just use this license” to be an easy way to comply. There might be a day when consensus definitions are established and licenses are mature. Until then, any definition of open should require a more complex analysis than simply looking at a license.

Header image: Physical Examination from the Smithsonian’s National Portrait Gallery

Carlin AI Lawsuit Against 'Impression with Computer'

The brewing dispute over a (purportedly - more on that below) AI-generated George Carlin standup special is starting to feel like another step in the long tradition of rightsholders claiming that normal activity needs their permission when done with computers.

Computers operate by making copies, and copies are controlled by copyright law. As a result, more or less since the dawn of popular computing, rightsholders have attempted to use those copies to extend their control by eliminating the rights of others.

While you can read a physical book without caring what the publisher thinks, publishers insist that reading an ebook with computer needs a license because ereader software loads the file by making copies. Similarly, although record labels can’t control the sale of used records or CDs, they managed to sue the concept of selling used music with computer out of existence.

In this case, although impressions have probably existed since there was more than one human, the Carlin estate appears to be claiming that “impression with computer” needs special permission from them.

The Carlin Video

The subject of this dispute is a video of a computer generated George Carlin avatar doing an hour of new comedy in the style of Carlin. As framed by Dudsey (the comedy team behind the video), in order to create the new content Carlin was “resurrected by an AI to create more material.” (this “resurrection” is necessary because Carlin died in 2008).

While that framing may turn out to be inaccurate (although arguably artistically important to the purpose of the new work), the release of the video kicked off a week of “AI is coming for us” coverage of various flavors, followed by a lawsuit from the Carlin estate.

Is This New?

While the AI packaging clearly drove discussion around the video, if you step back for a minute it really is just an impression of Carlin. This impression uses computers, but I’m not convinced that changes (or should change) the fundamental reality of the activity. Generally speaking, people don’t get veto rights over their impersonators.

Furthermore, if an intriguing article by Kyle Orland in Ars Technica is correct, it may be more than “basically just an impression.” It might just be “an impression.”

Orland’s take has subsequently been confirmed by the Dudsey team to the New York Times (“‘It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,’ Del wrote in an email. ‘The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.’”), although the same article reports that the Carlin estate continues to be skeptical of the video’s origin (which makes sense because, somewhat ridiculously, the viability of their entire claim may turn on the distinction).

Orland digs into the Dudsey team and their podcast to provide facially compelling evidence that the content of the special is just a regular old impersonation of Carlin. He even pulls what he describes as an “if I did it” quote from Dudsey podcast that accompanied the video describing how the jokes “could” have been created without sophisticated AI:

Clearly, Dudesy made this, but anyone could have made it with technology that is readily available to every person on planet Earth right now.

If you wanted to make something like this, this is what you would do: You would start by going and watching all of George Carlin’s specials, listening to all of his albums, watching all of his interviews, any piece of material that George Carlin has ever made. You would ingest that. You would take meticulous notes, probably putting them in a Google spreadsheet so that you can keep track of all the subjects he liked to talk about, what his attitudes about those subjects were, the relevance of them in all of his stand-up specials.

You would then take all of his stand-up specials and do an average word count to see just how long they are. You would then take all that information and write a brand new special hitting that average word count. You would then take that script and upload it into any number of AI voice generators.

You would then get yourself a subscription to Midjourney or ChatGPT to make all the images in that video, and then you would string them together into a long timeline, output that video, put it on YouTube. I’m telling you, anyone could have made this. I could have made this.

When framed this way, the whole thing starts to feel like a fairly vanilla impression. Which is how the video presents itself, opening with a disclaimer that ““what you’re about to hear is not George Carlin,” going on to compare itself to an impersonation “like Will Ferrell impersonating George W. Bush”.

Does the fact that the video includes a representation that is visually similar to Carlin change that analysis? It doesn’t in the physical world. As Brandon Butler quipped on Mastodon “Nobody tell the folks freaking out over the George Carlin special about the Hal Holbrooke Twain show.”

But the Carlin avatar in the video isn’t just someone dressed up like George Carlin! It is an animated version of him!

There’s nothing new about animated impressions either. This random wiki lists 317 celebrity caricatures from Looney Tunes and Merrie Melodies cartoons. The 1941 short Hollywood Steps Out alone contains dozens.

All of which is to say, doing an impression of someone is not new and does not require that person’s permission. Should doing the impression with a computer upend that?

The Carlin Estate Lawsuit

The Carlin estate lawsuit includes a lot of rhetoric against the video (“Defendants must be held accountable for adding new, fake content to the canon of work associated with Carlin without his permission (or that of his estate).”) and claims of harm from Carlin’s daughter Kelly (“My dad spent a lifetime perfecting his craft from his very human life, brain, and imagination. No machine will ever replicate his genius.”) that could be applied just as easily to any Carlin impression.

The suit also includes claims of violations of California’s Right of Publicity statutes, as well as copyright infringement.

While I think this discussion around copyright, AI training, and AI output is super interesting, I’m not going to be into them in this post. For the purposes of this post, the important thing is that the lawsuit contains the copyright claim at all.

Impression with Computer

The copyright angle only exists because a computer (might be?) involved in creating the new routines. If the new routine was created using Dudesy’s “if I did it” method: “watching all of George Carlin’s specials, listening to all of his albums, watching all of his interviews, any piece of material that George Carlin has ever made,” the Carlin estate would not have any copyright claim to bring, because thinking about things you have read and watched are not activities that rightsholders traditionally get to control

But because this impression does (may?) use computers, it becomes another example of a rightsholder trying to turn “they used a computer” into “I get to control this activity.” If you are someone (like me) who has traditionally been wary of these arguments, this seems like an important time to maintain that skepticism. Even in discussions related to AI.

Header image: Samuel L. Clemens (Mark Twain) from the Smithsonian’s National Portrait Gallery

How Explaining Copyright Broke the Spotify Copyright System

This post originally appeared on the Engelberg Center blog.

This is a story of how Spotify’s sophisticated copyright filter prevented us from explaining copyright law.

It is strikingly similar to the story of how a different sophisticated copyright filter (YouTube’s) prevented us from explaining copyright law just a few years ago.

In fact, both incidents relate to recordings of the exact same event - a discussion between expert musicologists about how to analyze songs involved in copyright infringement litigation. Together, these incidents illustrate how automated copyright filters can limit the distribution of non-infringing expression. They also highlight how little effort platforms devote to helping people unjustly caught in these filters.

The Original Event

This story starts with a panel discussion at the Engelberg Center’s Proving IP Symposium in 2019. That panel featured presentations and discussions by Judith Finell and Sandy Wilbur. Ms. Finell and Ms. Wilbur were the musicologist experts for the opposing parties in the high profile Blurred Lines copyright infringement case. In that case the estate of Marvin Gaye accused Robin Thicke and Pharrell Williams of infringing on Gaye’s song “Got to Give it Up” when they wrote the hit song “Blurred Lines.”

The primary purpose of the panel was to have these two musical experts explain to the largely legal audience how they analyze and explain songs in copyright litigation. The panel opened with each expert giving a presentation about how they approach song analysis. These presentations included short clips of songs, both in their popular recorded version and versions stripped down to focus on specific musical elements.

The YouTube Takedown

After the event, we posted a video of the panel on YouTube and the audio of the panel in our Engelberg Center Live! podcast feed. The podcast is distributed on a number of platforms, including Spotify. Shortly after we posted the video, Universal Music Group (UMG) used YouTube’s ContentID system to take it down. This kicked off a review process that ultimately required personal intervention from YouTube’s legal team to resolve. You can read about what happened here.

The Spotify Takedown

A few months ago, years after we posted the audio to our podcast feed, UMG appears to have used a similar system to remove our episode from Spotify. On September 15, we received an email alerting us that our podcast had been flagged because it included third party content (recall that this content is clips of the songs the experts were discussing analyzing for infringement)

screeenshot from the Spotify alert page with the headline "We found some third-party content in your podcast"

Using the Spotify review tool, we indicated that our use of the song was protected by fair use and did not need permission from the rightsholder.

screeenshot from the Spotify alert page with the headline "We found some third-party content in your podcast" and information about challenging the accusation of infringement

We received a confirmation that our review had been submitted and hoped that would be the end of it.

screeenshot from the Spotify alert page with the headline "Thank you for submitting this episode"

The Escalation

That was not the end of it. On October 12th, we received an email from Spotify that they were removing our episode because it was using unlicensed music and we had not responded to their inquiry.

screeenshot from the Spotify alert email informing us that the episode has been removed from the service

The first part was true - we had not obtained a license to use the music. This is because our use is protected by fair use and we are not legally required to do so. The second part was not true - we had immediately responded to Spotify’s original inquiry. We immediately responded to this new message, noting that we had responded to their initial message, and asking if they needed anything additional from us.

Spotify Tries to Step Away

Four days later, Spotify responded by indicating that this was now our problem:

The content will remain taken down from the service until the provider reaches a resolution with the claimant. Both parties should inform us once they reach a resolution. We will make the content live upon the receipt of instructions from both parties and any necessary updates. If they cannot reach a resolution, we reserve the right to act at our discretion. The email address we have for the claimant is [redacted].

This is probably where most users would have given up (if they had not dropped off well before). However, since we are the center at NYU Law that focuses on things like online copyright disputes, we decided to push forward. In order to do that, we needed more information. Specifically, we needed the original notice submitted by UMG.

Why the Nature of the Notice is Relevant

We needed the original notice from UMG because our next step turned on the actual form it took.

Many people are familiar with the broad outlines of the notice and takedown regime that governs online platforms. Takedown actions initiated by rightsholders are sometimes called “DMCA notices” because a law called the Digital Millennium Copyright Act (or DMCA for short) created the process. While most of the rules are oriented towards helping rightsholders take things off the internet, there is a small provision - Section 512(f) - that can impose damages on a rightsholder who misrepresents that the targeted material is infringing (this provision was famously litigated in the “Dancing Baby” case).

In other words, the DMCA includes a provision that can be used to punish rightsholders who send baseless takedown requests.

We feel that the use of the song clips in our podcast are exceptionally clear examples of the type of use protected under fair use. As a result, if UMG ignored the likelihood that our use was protected by fair use when it filed an official DMCA notice against our podcast, we could be in a position to bring a 512(f) claim against them.

However, not all takedown notices are official DMCA notices. Many large platforms have established parallel, private systems that allow rightsholders to remove content without going through the formal DMCA process. These systems rarely punish rightsholders for overclaiming their rights. If UMG did not use an official DMCA notice to take down our content, we could not bring a 512(f) claim against them.

As a result, our options for pushing back on UMG’s claims were very different depending on the specific form of the takedown request. If UMG used an official DMCA notice, we might be able to use a different part of the DMCA to bring a claim against them. If UMG used an informal process created by Spotify, we might not have any options at all. That is why we asked Spotify to send us the original notice.

Spotify Ignores Our Request for Information

On October 12th, Spotify told us that in order to have our podcast episode reinstated we would need to work things out with UMG directly. That same day, we asked for UMG’s actual takedown notice so we could do just that.

We did not hear anything back. So we asked again on October 23rd.

And on October 26th.

And on October 31st.

On November 7th — 26 days after our episode was removed from the service — we asked again. This time, we sent our email to the same infringement-claim-response@ email address we had been attempting to correspond with the entire time, and added legal@. On November 9th, we finally received a response.

Spotify Asks Questions

Spotify’s email stated that our episode was “not yet subject to a legal claim,” and that if we wanted to reinstate our episode we needed to reply with:

  • An explanation of why we had the right to post the content, and
  • A written statement that we had a good faith belief that the episode was removed or disabled as a result of mistake or misidentification

This second element is noteworthy because it matches the language in Section 512(f) mentioned above.

We responded with a detailed explanation of the nature of the episode and the use of the clips, asserting that the material in question is protected by fair use and was removed or disabled as a result of a mistake (describing the removal as a “mistake” is fairly generous to UMG, but we decided to use the options Spotify presented to us).

Our response ended with another request for more information about the nature of the takedown notice itself. That request specifically asked if the notice was a formal notice under the DMCA, and explained that we were asking because we were considering our options under 512(f).

Clarity from Spotify

Spotify quickly replied that the episode would be eligible for reinstatement. In response to our question about the notice, they repeated that “no legal claim has been made by any third-party against your podcast.” “No legal claim” felt a bit vague, so we responded once again with a request for clarification about the nature of the complaint. The next day we finally received a straightforward answer to our question: “The rightsholder did not file a formal DMCA complaint.”

Takeaway

What did we learn from this process?

First, that Spotify has set up an extra-legal system that allows rightsholders to remove podcast episodes. This system does a very bad job of evaluating possible fair uses of songs, which probably means it removes episodes that make legitimate use of third party content. We are not aware of any penalties for rightsholders who target fair uses for removal, and the system does not provide us with a way to pursue penalties ourselves.

Second, like our experience with YouTube, it highlights how challenging it can be for regular users to dispute allegations of infringement by large rightsholders. Spotify lost our original response to the takedown request, and then ignored multiple emails over multiple weeks attempting to resolve the situation. During this time, our episode was not available on their platform. The Engelberg Center had an extraordinarily high level of interest in pursuing this issue, and legal confidence in our position that would have cost an average podcaster tens of thousands of dollars to develop. That cannot be what is required to challenge the removal of a podcast episode.

Third, it highlights the weakness of what may be an automated content matching system. These systems can only determine if an episode includes a clip from a song in their database. They cannot determine if the use requires permission from a rightsholder. If a platform is going to deploy these types of systems at scale, they should have an obligation to support a non-automated process of challenging their assessment when they incorrectly identify a use as infringing.

We do appreciate that the episode has finally been restored. You can listen to it yourself here, along with audio from all of the Engelberg Center’s events on our Engelberg Center Live! feed, wherever you get your podcasts (including, at least as of this writing, on Spotify). That feed also includes a special season on the unionization of Kickstarter, and on the Knowing Machines project’s exploration of the datasets used to train AI models.

This post originally appeared on the OSHWA blog .

Earlier this month OSHWA, along with Public Knowledge, the Digital Right to Repair Coalition, Software Freedom Conservancy, iFixIt, and scholars of property and technology law, filed a brief in the US Court of Appeals supporting the principle that owning something means that you get to decide how to use it. While that principle has been part of US (and, before there was a US, British) law for centuries, recent attempts to protect copyright have worked to undermine it.

We filed the brief in a case that EFF has brought on behalf of Dr. Matthew Green and Dr. bunnie Huang (someone who is well known to the open source hardware community) challenging the constitutionality of parts of the US law that prevent access to digital works.This issue is important to the open source hardware community because owning hardware is a critical part of building and sharing hardware.

The Issue

The case focuses on Section 1201 of the Digital Millennium Copyright Act (DMCA). The DMCA is probably best known for its Section 512 notice and takedown regime for works protected by copyright online (that’s the “DMCA” in a “DMCA Notice” or “DMCA Takedown” that removes videos from YouTube). Section 1201 is a different part of the law that creates legal protections for digital locks that limit access to copyright-protected works.

Basically, Section 1201 is a special law that makes it illegal to break DRM. And as long as DRM prevents you from using your toaster how you see fit, you don’t really own it.

These protections were originally designed to protect digital media – think the encryption of DVDs. However, since code is protected by copyright, and just about everything has code embedded in it, the 1201 protections undermine ownership rights in a huge range of things.

The brief illustrates how 1201-protected DRM undermines traditional rules of ownership in a number of different ways:

  • The right to repair: DRM blocks third-party parts or fixes, monopolizing the repair market or forcing consumers to throw away near-working devices.
  • The right to exclude: DRM spies on consumers and opens insecure backdoors on their computers, allowing malicious software to enter from anywhere.
  • The right to use: DRM prevents consumers from using their devices as they wish. A coffee machine’s DRM may prohibit the brewing of other companies’ coffee pods, for example.
  • The right to possess: Device manufacturers have leveraged DRM to dispossess consumers of their purchases, without legal justification.

The Challenge

This case is challenging Section 1201 on First Amendment grounds. As written, the law imposes content-based restrictions on speech. Tools for circumventing DRM can advise users on how and why to protect their property rights. Prohibiting them means that the law gives legal benefits to anti-ownership DRM software while criminalizing pro-ownership DRM-circumvention software.

Additionally, whatever one thinks about using DRM to protect digital media, the current law is not well tailored to achieve that goal. Today, DRM has been added to all sorts of devices that are very far from “digital media” in any reasonable sense. As the brief notes:

Devices like refrigerators have [DRM] not to stop rampant refrigerator copyright piracy, but so manufacturers can maintain market dominance, block competition, and force wasteful consumerism that boosts those manufacturers’ bottom lines.

These uses of DRM are protected by the current law but have nothing to do with protecting digital media.

What’s Next

This brief is part of an appeal in the U.S. Court of Appeals for the District of Columbia Circuit. It will be argued in the coming months. EFF’s page on the case is here.

We want to end this post with a huge thank you to Professor Charles Duan, the author of our brief. Professor Duan does a great job of bringing clarity to this important issue facing the open source hardware community. Plus, you always know any brief written by him will include citations reaching back centuries. This brief shows that case law reaching back to 1604 is still relevant to questions about ownership today!

Powerful ToS Hurt Companies and Lawyers, Not Just Users

I recently found myself reading Mark Lemley’s paper The Benefit of the Bargain while also helping a friend put together the Terms of Service (ToS) for their new startup. Lemley’s paper essentially argues that modern ToS - documents that are written by services to be one sided and essentially imposed on users as a take-it-or-leave-it offer - should no longer be enforced as contracts because they have lost important fairness elements of what make contracts contracts.

This argument, which I found fairly compelling, mostly focuses on the harm that the modern ToS regime does to users. ToS allow companies to impose a wide range of conditions on users that are beyond the scope of what users would ever reasonably agree to if they were offered a meaningful choice. That is in addition to the unreasonable expectation that everyday people are reading the millions of words worth of contracts they agree to in any given week.

Since I was reading this article while helping to draft ToS for a new service, I was also drawn to something the article did not mention: the ways in which these unilaterally imposed ToS hurt the other entities connected to them. Specifically, the lawyers who draft them and the companies that offer them.[1]

The Drafting Lawyers

I want to start with the most sympathetic characters in this drama: the lawyers hired to write ToS that are heavily skewed in favor of their client. Spare a thought!

It is possible to imagine such a lawyer who is pulled between two competing forces.

On one hand, they know various types of clauses in these agreements are Bad Policy, or at the least unfair to users. Such a lawyer might agree with Lemley that the world would be better without a default set of fair rules, and with a presumption that those rules could only change if the users made a meaningful choice. In their heart of hearts, they might want to draft ToS that they thought more fairly balanced the interests of the company and the company’s users.

On the other hand, that same lawyer is bound by some form of a duty to vigorously represent their client. These unbalanced terms are clearly in the company’s (at least short term) interest. Furthermore, they are essentially industry standard. As a result, this lawyer might worry that it could be a form of malpractice to fail to include the unbalanced terms in the ToS.

When faced with this tension, the lawyer might try to explain to their client that there are long term benefits to maintaining a balanced agreement with users, and therefore to leave out the most one-sided clauses. However, and there is no small irony here, it could be hard to give the client enough information so that they could meaningfully opt out of the tilted ToS arms race.

It is hard to understand the cost of giving up the short, medium, and long-term advantages provided by unbalanced ToS in service of a larger principle of social fairness. There are some startup founders who are interested in the discussion and have the bandwidth to actually process it. There are many more who will never prioritize the discussion enough to meaningfully consent to giving away the advantage.

Thus, the lawyer may face two options: a) vigorously represent their client’s interest and support the bad equilibrium by writing an industry-standard, unbalanced ToS, or 2) get out of the writing ToS business for anyone without the time and inclination to wade through the larger policy arguments.

Those feel like bad options!

The Company

This state of affairs can also harm the company, and not just in the “forcing your customers into unbalanced agreements is bad karma” kind of way.

As Lemley’s paper points out, in the offline world most businesses operate without any sort of formal written contracts at all (“you didn’t sign a contract governing the purchase of an apple from the grocery store.”). The ability to append a ToS to every digital transaction has helped to create an expectation that companies will do exactly that.

In order to do so, the companies need to take the time to write those ToS in the first place. Suddenly, instead of focusing on building and shipping their widgets, companies spend time with lawyers making sure that their ToS include all of the advantages they could possibly claim.

That’s probably a waste of time for just about everyone involved. This is made even more of a waste of time because, when faced with this new obligation, most small companies don’t hire lawyers (which would be one type of waste of resources). Instead, they tap someone without a legal background to semi-arbitrarily assemble their ToS from random corners of the internet (a slightly different type of waste of resources). And I suspect that person will increasingly outsource that task to generative AI (a third type of waste of resources).

These companies don’t really understand what unbalanced terms they are imposing on their users, what advantages they receive from them, and probably would not miss them if they were not there. They are just checking a box they don’t fully understand because it has ended up on the “things startups do” list.

I think all of these behaviors argue in favor of Lemley’s ultimate suggestion that we make a policy choice to move towards a default set of balanced rules and away from unbalanced ToS. Until then, the current system is so broken that it might make you feel bad for the lawyers and companies supposedly benefitting from it.

[1] I’ve been to enough “your paper should actually be my paper” peer reviews that I want to be clear that nothing in this post is intended to suggest that the Lemley paper is incomplete without including these points, or even that they did not occur to him. Word counts, and time, are limited in this life. Something that is interesting to me does not need to be interesting to everyone else.

hero image: a portion of Lawyers in dispute from the Met’s open access collection.