The Smithsonian Goes Open Access

It doesn’t get a lot bigger than this. On February 25th the Smithsonian went in big on open access. With the push of a button, 2.8 million 2D images and 3D files (3D files!) became available without copyright restriction under a CC0 public domain dedication. Perhaps just as importantly, those images came with 173 years of metadata created by the Smithsonian staff. How big a deal is this? The site saw 4 million image requests within the first six hours of going live. People want access to their cultural heritage.

While this is all very exciting, I wanted to take a moment to dive a bit deeper into what I see in the licensing portion of this announcement. While there are many important parts of this announcement - like the API to actually access it, and fully downloadable data that is already being turned into interesting visualizations - the licensing decisions are worth considering as well. The Smithsonian has helped to set a new standard for how open access can work at big institutions, although there are still a few things that could use some improvement.

I also want to reflect on how this moment is the result of many years of effort and advocacy by a wide range of people. Some relevant moments in that process are Carl Malamud’s 2007 “What Would Luther Burbank Do?” effort (original and archive) (one rule of thumb about big moments in openness is that Carl was usually there years earlier laying the groundwork), Michael Edison’s work on the Smithsonian Commons (the best links I have are here and here, although I’m happy to update if anyone has something better), the Cooper Hewitt’s decision to release its metadata under CC0 (followed by the 3D scan of the entire building and their font (that I used quite recently) to boot), and the Smithsonian’s own study on the impact of open access on galleries, libraries, museums, and archives (not surprisingly, written by Effie Kapsalis, who would go on to spearhead this open access move by the Smithsonian). The Smithsonian’s decision to start making 3D models of its collection available online (lead by Vince Rossi) also helped lay the groundwork for the inclusion of 3D in this release. While these efforts are worth mentioning for many reasons, one is as a reminder that advocacy takes a long time and is made up of many smaller steps. Big things don’t just happen.

Make it Easy for Good Actors to be Good

Some people will see an announcement like this and immediately think of all of the bad things that could be done with these objects. While I do not dispute that bad things are possible, letting (the relatively small number of) bad actors guide thinking about open access policies does a disservice to (the relatively large number of) good actors. Copyright restrictions or terms of service are unlikely to stop bad actors from doing bad things with cultural artifacts. However, they create significant barriers to good actors doing good things with them. Access regimes should be designed to empower good actors, not to try and slow down every possible fringe bad actor. That seems to be largely how the Smithsonian approached this effort.

CC0 By Default

CC0 is a public domain dedication that clarifies that the Smithsonian is not making any claim of ownership over the digital files it is releasing. The cultural objects included in this release are all in the public domain, so the use of CC0 is not intended to address copyrights attaching to the objects themselves. Instead, CC0 is a way for the Smithsonian to indicate that it does not have any additional right in the digital file as distinct from the object it represents.

This is important in both the 2D and 3D context. In the US it is fairly clear that a digital copy of a 2D work does not get its own copyright. That is also true for 3D scans in the US. The EU is taking steps in that direction as well. The legal status of 2D images of 3D objects is a bit more ambiguous, as is that of 3D models (created in CAD instead of by scanning the object) of cultural artifacts. There is also a lingering possibility that some jurisdictions could take the law in a completely different directions.

While the weight of legal and logical authority suggests to me that the vast majority of digitizations of public domain objects do not get their own copyright protection, CC0 waives away that ambiguity and comes down clearly on the side of openness. In addition to being right on the law, I believe that this decision is right on the theory. Creating an accurate reproduction of a work in the public domain should not give you a right over it.

In the 3D context I really appreciate that the Smithsonian is applying CC0 to scans and reproductions. See, for example, the scan of the Apollo 11 Hatch:

That file is pretty unambiguously in the public domain and released under CC0. The copyright status of the CAD model of the same hatch is slightly less clear. Nonetheless, Smithsonian decided to clear up any ambiguity by using CC0.

Not Everything is Open

2.8 million files is a lot of files, but it is far from everything in the Smithsonian’s collection. As this slide from the Smithsonian’s 3D Digitization Team makes clear, there are still many objects left to digitize:

Slide showing that 1 million objects are currently on display while 154 million objects are  hidden

Some objects have not been digitized yet because they simply have not made it to the front of the queue yet. Others have been digitized but have not been included as part of the full open access program. In many cases, that is fine too.

One example is this scan of the “Project EgressApollo 11 hatch reproduction.

This is a scan of a replica of the Apollo 11 hatch created by Adam Savage as part of the 50th anniversary of Apollo 11. Unlike the original hatch, there is at least an argument to be made that the reproduction is protected by copyright. If the underlying object is protected by copyright the Smithsonian may not have the legal ability to release the files under CC0. So it didn’t.

It is OK to Keep Some Things Out of the Open Access Program

The more interesting example is that of the Sculpin Hat.

Sculpin Hat

The hat was a ceremonial object of the Tlingit clan of Sitka, Alaska. It was purchased in 1884, which means from a copyright standpoint it is in the public domain. The Smithsonian scanned the damaged hat in order to create a restored replica for the clan in 2019. That means that they have the scan. And, while the scan is up on the 3D portal for viewing, it is not released under a CC0 license or even downloadable.

Why not? Because there is more to an open access program than copyright considerations. As the digitization team notes, there are cultural reasons why an object might not be included:

Slide showing that some objects will not be open because of cultural or other reasons

These are complex questions without easy answers, and it is quite reasonable to want to engage in good faith dialogs about them with all of the stakeholders before releasing the digital file without restriction. The Traditional Knowledge labels project is another interesting attempt to begin to engage with these questions.

If Works are Kept Out of the Open Access Program, The Smithsonian Needs to Explain the Rules

While the Smithsonian’s instinct to hold some files back in a reasonable one, it needs to do a much better job of explaining them to the public.

The Sculpin Hat has a notice that ‘Usage Conditions Apply’

image of usage conditions

The same notice applies, somewhat unexpectedly, on the 3D scans of the gloves worn by Neil Armstrong on the Apollo 11 mission:

image of usage conditions

There are at least two problems with this state of affairs. First, the Smithsonian’s use conditions allow for “non-commercial, educational, and personal uses”. However, the files are not actually available for download on the portal. That means even uses within the Smithsonian’s rules are not possible yet.

Second, the popup notice makes it exceedingly unclear how the Smithsonian is imposing these conditions on users. Are these restrictions based in copyright law? If so, and there is no copyright in either the scanned object or the scan file, does that mean that these restrictions are not legally enforceable?

Alternatively, the restrictions may be based in the Smithsonian’s Terms of Use. Assuming the Smithsonian structured the download in a way that required users to agree to those Terms, those Terms could be considered a contract between the Smithsonian and the downloader that governs the use of the files. Basically, the Smithsonian could say that as a condition of accessing the files a downloader has to agree to their terms - that would allow the Smithsonina to impose rules without relying on copyright law. However, as currently written, the Terms of Use also seem to frame the Smithsonian’s control over the files as a copyright issue, not an access issue. The usage conditions section of the terms reads in part:

All other Content is subject to usage conditions due to copyright and/or other restrictions and may only be used for personal, educational, and other non-commercial uses consistent with the principles of fair use under Section 108 of the U.S. Copyright Act. All rights not expressly granted herein by the Smithsonian are reserved…

It is fine for the Smithsonian to reserve rights that exist. But framing the use restriction in the context of copyrights that do not exist is exceedingly confusing, if not legally invalid.

As discussed earlier, the Smithsonian may have valid reasons to want to limit access to some digital files. That being said, it also has an obligation to create and describe those limitations in a legally coherent way.


As I said at the outset, this is an exciting time for open access. The Smithsonian’s decision to release a large number of objects and to include 3D objects should help set the standard for open access going forward. While this effort - like all open access efforts - is a work in progress (I can’t help but notice that the Presidential Portraits collection is missing at least one portrait that we know exists, and I know of a few more works that people want to get in the 3D scan queue), it is largely being done with intentionality and thoughtfulness.

While I know that there were many, many people involved in this effort at the Smithsonian, I want to say a special thank you to Effie Kapsalis and Vince Rossi for the crazy amount of work and persistence they put into making this happen. I’m also heartened that my Engelberg Center colleague Neal Stimler was involved in making all of this happen. When an institution as big as the Smithsonian does something like this it makes a huge splash, but that does not mean getting it to happen is easy.

And one last thing - if you want to start imagining what you can do with all of this new culture at your fingertips, there’s a whole page of examples of things that talented artists have done so far.* You could even start with this book.

*I could write a whole other blog post about how important it is to go beyond releasing objects in an open access program and actually model use of those objects by recruiting creators. And maybe I will. But not today. This post is already way too long.

Feature image: Copying in the Louvre by Alfred Henry Maurer

How Explaining Copyright Broke the YouTube Copyright System

This post originally appeared on the Engelberg Center blog.

This is a story about how the most sophisticated copyright filter in the world prevented us from explaining copyright law. It doesn’t involve TikTok dance moves or nuanced 90s remixes featuring AOC. No, it involves a debate at a law school conference over how and when one song can infringe the copyright of another and how exactly one proves in a courtroom if the accused song is “substantially similar” enough to be deemed illegal. In the end, because it was blocked by one of the music companies who owns the song, it also became a textbook study in how fair use still suffers online and what it takes to pushback when a video is flagged. A copyright riddle wrapped up in an algorithmic enigma, symbolic of the many current content moderation dilemmas faced by online platforms today.

If you want to watch the video it is available here. If you prefer to listen to it you can subscribe to the Engelberg Center Live Events podcast here. And if you are curious about how new European laws about copyright filtering may impact this sort of situation in the future both inside and outside of Europe, you might be interested in our upcoming conference examining online copyright liability A New Global Copyright Order? on April 20, 2020. You can find out more information about that conference here.

The Video

The video in question was a recording of the “Proving Similarity” panel, which was part of the Engelberg Center’s Proving IP symposium in May of 2019. The panel, which was moderated by Professor Joseph Fishman, featured presentations and discussions by Judith Finell and Sandy Wilbur. Ms. Finell and Ms. Wilbur were the musicologist experts for the opposing parties in the high profile Blurred Lines copyright infringement case. In that case the estate of Marvin Gaye accused Robin Thicke and Pharrell Williams of infringing on Gaye’s song “Got to Give it Up” when they wrote the hit song “Blurred Lines.”

The primary purpose of the panel was to have these two musical experts explain to the largely legal audience how they analyze and explain songs in copyright litigation. The panel opened with each expert giving a presentation about how they approach song analysis. These presentations included short clips of songs, both in their popular recorded version and versions stripped down to focus on specific musical elements.

The Takedown

screenshot from the YouTube copyright summary and status page listing the songs included in the video

The video used clips of the songs in question to illustrate specific points about how they were analyzed in the context of copyright infringement litigation. As such, we were confident that our use of the songs were covered by fair use and disputed the claims using YouTube’s internal system.

Shortly thereafter we received notice that the rightsholder was rejecting our dispute on multiple songs.

screenshot from email alerting us that UMG has decided that the claim is still valid

The Decision and the Question

Still confident that our uses were covered by fair use, we researched the YouTube counternotification process. We discovered that if we continued to challenge the accusation of infringement and lost, our video would be subject to copyright strikes. If the account was subject to multiple strikes its ability to live stream could be restricted or the account could be terminated. While our colleagues in the communications department were highly supportive of our efforts, they were concerned that one misstep could wipe NYU Law’s entire YouTube presence off the internet.

In deciding how we could continue to press our case, one question was unclear. Our single video was subject to multiple copyright infringement claims. If we failed to prevail, would that mean that the account was subject to one copyright strike because all of the claims were against a single video, or multiple strikes tied to each claim against the single video? As there were four remaining claims against our video, and three claims could result in the termination of the account, the distinction was highly relevant to us.

screenshot from the appeal dispute page

Unfortunately, we still do not know the answer to that question. This page seems like the closest to one having an answer, but it does not provide one to our specific question. We tried using the ‘Was this helpful?’ link at the bottom to get additional information, but YouTube did not respond.

The Resolution

This would have been a dead end for most users. Unable to understand how the already opaque dispute resolution process might impact the status of their account, they would have to decide if it was worth gambling their entire YouTube account on the chances that their some combination of YouTube and the rightsholder would recognize their fair use claim.

Since we are the center at NYU Law focused on technology and innovation, it was not a dead end for us. We reached out to YouTube through private channels to try to get clarity around the copyright strike rules. While we never got that clarity, some weeks later we were informed that the claims against our video had been removed.

The Takeaway

What lessons can be learned from this process?

First, it highlights how challenging it can be for users with strong counter-arguments to dispute an allegation of infringement by large rightsholders. The Engelberg Center is home to some of the top technology and intellectual property scholars in the world, as well as people who have actually operated the notice and takedown processes for large online platforms. We had legal confidence in our position that would cost an average user tens of thousands of dollars (if not more) to obtain. Even all of those advantages were not enough to allow us to effectively resolve this dispute. Instead, we had to also rely on our personal networks to trigger a process - one that is still unclear - that resulted in the accusations being removed. This is not a reasonable expectation to place on average users.

Second, it highlights the imperfect nature of automated content screening and the importance of process when automation goes wrong. A system that assumes any match to an existing work is infringement needs a robust process to deal with the situations where that is not the case. Our original counterclaim included a clear explanation of the nature of the video and the reasons for using the clips. It is hard to imagine someone with any familiarity with copyright law watching the video, reviewing our claim, and then summarily rejecting it. Nonetheless, that is what happened. No matter how much automation allows you to scale, the system will still require informed and fair human review at some point.

Third, it highlights the costs of things going wrong. The YouTube copyright enforcement system is likely the most expensive and sophisticated copyright enforcement system ever created. If even this system has these types of flaws, it is likely that the systems set up by smaller sites will be even less perfect.

Nonetheless, we are happy that the video has been restored. You can watch it - along with all of the other videos from Proving IP - on the NYU Law YouTube channel. You can also listen to the audio from it and all of the Engelberg Center’s events by subscribing to our live events podcast.

Finally, Europe has recently passed legislation designed to oblige more websites to implement automated copyright filters. Our event A New Global Copyright Order? on April 20 will examine how that legislation will impact Europe and the worldwide conversation around copyright law. We hope to see you there.

Easy Public Domain Picture Frame with the Cleveland Museum of Art Open Access API

In celebration of Public Domain Day 2020 I decided to try to turn the old monitor in my office into a picture frame to display a rotating collection of public domain works. The Cleveland Museum of Art (CMA) launched a robust Open Access program in 2019, so I decided to use their API to power it. This blog post explains all of the steps in creating the project so you can make one too.

This is a fairly lightweight project, so all you need to make it happen is:

  1. A monitor
  2. A raspberry pi (or any other computer)
  3. Some code

Most of this post is about the code. The theory behind this project is that there is a website that regularly pulls a new image from the CMA’s API and displays it along with some information like the work’s title and creator. The raspberry pi boots into a fullscreen browser displaying that page. The screen also needs to automatically turn off at night because it is a waste to keep the monitor on all night when there is no one around to see it.

The entire project is a double celebration of openness. In addition to the works being displayed, the only reason I could even begin to build it is that the open nature of the internet’s architecture allows me to peek at better-designed sites to learn from them. Open educational resources like the Coding Train have taught me just enough javascript to be able to put something like this together.

The Site

I decided (guessed?) that the easiest way to make all of this work was to create a website that displayed the rotating set of pictures. I’m bad at javascript, so this gave me a chance to learn a little bit more about it.

The self-contained site is available in this repo. If you don’t care about how it works, you can just access a live version of it here.

index.html

This file is minimal and straightforward - it is essentially just a container with pointers to a stylesheet and the script. The one thing to note is that the script is inside of a container div:

        <div class='container'>

         <script src='script.js'></script>

        </div>

This allows me to overlay the text descriptions on top of the image.

script.js

This file is the heart of the action. I will walk through each section to explain what it does. All of the console.log lines are just for my own troubleshooting and can basically be ignored.

//function to generate a random number
function getRndInteger(min, max) {
  return Math.floor(Math.random() * (max - min) ) + min;
}

This initial function is used to generate a random number. The random number is needed in two places: first to pick the image from the collection, and second to determine how long the image will stay up before the page refreshes.

//uses the function to pick a random image in the collection
var offset = getRndInteger(1, 31278);
//inserts that random number into the request url, returning a json file
var target_json_url = "https://openaccess-api.clevelandart.org/api/artworks/?limit=10&indent=1&cc0=1&has_image=1&skip=" + offset;

This block of code is used to access the entry via the CMA’s API. I believe that there are 31,277 entries in the CMA’s open access catalog that have an image. The first line picks a random number between 1 and 31,277. The second line uses the API’s syntax to jump to the work that corresponds to that number.

The limit=10&indent=1 elements in the URL are probably unnecessary. The cc0=1&has_image=1 elements are important - they limit results to ones that have a CC0 license and have an image associated with the entry. Those are the open access entries that I care about.

//create new request object instance
let request = new XMLHttpRequest();
//opens the file
request.open('GET', target_json_url);
request.responseType = 'json'
request.send();

This block of text creates an object to hold the json file that the API returns at the URL and then opens the json file into it. Basically it creates and fills the container for the JSON file that corresponds to the object that we randomly selected above.

request.onload = function() {
    const response_json = request.response;
    //gets the image URL + tombstone of a random image from the collection and turns it into an array assigned to a variable
    var found_image_info = grabImageInfo(response_json);

    var picked_image_URL = found_image_info[0];
    var picked_image_tombstone = found_image_info[1];
    var picked_image_title = found_image_info[2];
    var picked_image_author = found_image_info[3];
    var picked_image_date = found_image_info[4];

    //creates the image to be  posted
    var img = document.createElement("img");
    img.src = picked_image_URL;

    img.alt = 'picked_image_tombstone';

    //creates the text
    var tomb_text = document.createTextNode(picked_image_tombstone)

    //creates the linebreak
    var linebreak = document.createElement('br');

    let item = document.createElement('div');
    item.classList.add('item');
    item.innerHTML = `<div class="container"><img class="beach-image"  src="${picked_image_URL}" alt="beach image"/><div class="textStyle">${picked_image_title}<br>${picked_image_author}<br>${picked_image_date}</div></div>`;
    document.body.appendChild(item);

    //set up the refresh
    //time is in ms
    //this sets the range
    var refresh_interval = getRndInteger(5000, 20000)
    console.log("refresh rate = " + refresh_interval);
    //this uses the range to reset the page
    setTimeout(function(){
        location = ''
    },refresh_interval)
}

This block is where most of the work happens, so I’ll break it down in smaller pieces. The reason it is all tucked into a request.onload function is that the code in this block waits to load until it has successfully loaded the data from the API in the background.

    const response_json = request.response;
    //gets the image URL + tombstone of a random image from the collection and turns it into an array assigned to a variable
    var found_image_info = grabImageInfo(response_json);

This first section assigns the contents of the JSON file to a variable and then sends the JSON file to the grabimageInfo function described below. That function pulls all of the data I care about out of the JSON file and puts it in an array that can be accessed with bracket notation (see next block).

    var picked_image_URL = found_image_info[0];
    var picked_image_tombstone = found_image_info[1];
    var picked_image_title = found_image_info[2];
    var picked_image_author = found_image_info[3];
    var picked_image_date = found_image_info[4];

This section assigns a variable to each element in the found_image_info array.

    //creates the image to be  posted
    var img = document.createElement("img");
    img.src = picked_image_URL;

    img.alt = 'picked_image_tombstone';

This section creates an image element. The source is the URL that comes from the JSON file and the alt text is the tombstone text from the JSON file.

    let item = document.createElement('div');
    item.classList.add('item');
    item.innerHTML = `<div class="container"><img class="beach-image"  src="${picked_image_URL}" alt="beach image"/><div class="textStyle">${picked_image_title}<br>${picked_image_author}<br>${picked_image_date}</div></div>`;
    document.body.appendChild(item);

This section creates the HTML to be added to the index.html file. The item.innerHTML section creates an HTML payload with the image and the title, author, and date overlayed on top of it. If you want to change what is displayed over the image this is where you should start messing around.

    //set up the refresh
    //time is in ms
    //this sets the range
    var refresh_interval = getRndInteger(5000, 20000)
    console.log("refresh rate = " + refresh_interval);
    //this uses the range to reset the page
    setTimeout(function(){
        location = ''
    },refresh_interval)

This is the section that sets up the page refresh. The arguments you pass to the getRndInteger variable determines the bounds of the refresh rate. Remember that the numbers are in ms. I decided to make this slightly random instead of a fixed number to add a bit of variability to the display.

function grabImageInfo(jsonObj) {

    //pulls the elements of each piece and assigns it to a variable
    var data_url = jsonObj['data'][0]['images']['web']['url']
    var data_tombstone = jsonObj['data'][0]['tombstone']
    console.log(data_tombstone)
    var data_title = jsonObj['data'][0]['title']
    //the author info sometimes doesn't exist, which screws up the function. Pulling this part out of the function fixes it because the jsonObj is not evaluated before the try/catch. I am not sure what that means but it works.
    try {
         data_author = jsonObj['data'][0]['creators'][0]['description']
     }
     catch (e) {
         data_author = ''

     }
    var data_creation_date = jsonObj['data'][0]['creation_date']

    console.log("url = " +data_url)

    //creates an array with the URL, tombstone, title, author, and creation date of the random object picked
    var function_image_data = [data_url, data_tombstone, data_title, data_author, data_creation_date]
    //returns that array
    return function_image_data;
}

This is the function to extract data from the JSON file. It pulls each relevant element and then adds it to an array. Each of the var data_url = jsonObj['data'][0]['images']['web']['url'] requests are essentially the same, with the difference being where in the JSON file they are looking for the relevant data.

try {
     data_author = jsonObj['data'][0]['creators'][0]['description']
 }
 catch (e) {
     data_author = ''

 }

The author variable works slightly differently. Sometimes the author data does not exist in the records. This structure allows the script to handle errors without crashing.

var function_image_data = [data_url, data_tombstone, data_title, data_author, data_creation_date]
//returns that array
return function_image_data;

Finally, each element of the data is put into an array and returned out of the function. The order of how the data is added to the array is arbitrary, but it is consistent so if you move something around here make sure to change how you pull them out at the top of the script.

style.css

This is also a fairly strightforward css file. The .textStyle section is what you use to style the text. I also believe that the .container section needs to be set to relative in order for the overlay to work.

The most interesting part of the file is probably the @font-face section. That loads the custom font. The font is the fantastic font that the Cooper Hewitt made available as part of their open access project a few years ago. I always like using the font for open access-related projects. The fonts live in the /data folder. They are applied to all of the text in the * section.

The Pi

Once you have everything up and running you can access it from any browser. You can try it here, press F11, and just let it happen in full screen.

If you want to run it constantly as a picture frame it makes sense to devote a computer to the task. A Raspberry Pi is a prefect candidate because it is inexpensive and draws a relatively small amount of electricity.

You could set things up so the pi hosts the file locally and then just opens it. I decided not to do that, mostly because that would involve automatically starting a local server on the pi, which was one more thing to set up. Since the service needs to be online to hit the API anyway, I thought it would be easier to just set up the page on my own domain. I have no idea if that is actually easier.

There are two and a half things you need to do in order to set the pi to automatically boot into displaying the site in fullscreen mode as a full time appliance.

Start in Fullscreen Mode

You can start Chromium in fullscreen mode from the command line. That means you can add the line to the pi’s autostart file. Assuming your username is just ‘pi’ (the default when you start raspbian), open a terminal window and type:

nano /home/pi/.config/lxsession/LXDE-pi/autostart

This will allow you to edit the autostart file directly. Add this line to the file (which is probably otherwise blank):

@chromium-browser --start-fullscreen michaelweinberg.org/cma_pd

You can change the final URL to whatever you like. If you are hosting your own version of this page, that is where to make the switch.

You may find that your fullscreen display still gets a scroll bar on one side. If that’s the case, the half thing you need to do is open chromium and type chrome://flags in the toolbar. Once you are looking at the flags, search for overlay scrollbars and enable it. That will hide the scroll bars.

Turn off the Screen

The final thing you might want to do is turn off the screen of the display at night. In order to do this you need to make two entries in cron. Here is a nice intro to cron. Cron is a linux utility that allows you to schedule commands.

The commands you end up scheduling may vary based on your particular setup. This is a helpful tutorial laying out options to make this happen. The ones that worked for me were the vcgencmd ones.

In order to schedule those I opened a terminal window and typed crontab -e. I then added two lines. This line turned off the display: vcgencmd display_power 0 and this line turned it back on: vcgencmd display_power 1. Use crontab to schedule these at appropriate times.


That’s that. This will let you set up a rotating set of public domain images on any display you might have access to. Good luck with your own version.

List image: The Biglin Brothers Turning the Stake, Thomas Eakins, 1873

This post originally appeared in Slate and was co-authored with Gabriel Nicholas

In the tech policy world, antitrust is on everyone’s minds, and breaking up Big Tech is on everyone’s lips. For those looking for another way to fix tech’s competition problem, one idea keeps popping up. Mark Zuckerberg named it as one of his “Four Ideas to Regulate the Internet.” Rep. David Cicilline, a Democrat from Rhode Island and chairman of the House Judiciary Committee’s antitrust subcommittee, said it could “give power back to Americans.” It’s already enshrined as a right in the European Union as part of the General Data Protection Regulation, and in California’s new Consumer Privacy Act as well.

The idea is data portability: the concept that users should be able to download their information from one platform and upload it to another. That way, the theory goes, people can more easily try new products, and startups can jump-start their products with existing user data. The family group chat can move off of WhatsApp without leaving behind years of data. Members of the anarcho-socialist Facebook group can bring their conversations with them and take their Marxist memes with them. A whole new world can flourish off of years of built-up data. It’s competition without the regulatory and technological headache of breaking up companies.

But data portability might not be the regulatory golden goose the private and public sectors hope it is. It’s not even a new idea: Facebook has allowed users to export their data through a “Download Your Information” tool since 2010. Google Takeout has been around since 2011. Most major tech companies introduced some form of data portability in 2018 to comply with GDPR. Yet no major competitors have been built from these offerings. We sought to find out why.

To do this, we focused our research on Facebook’s Download Your Information tool, which allows users to download all of the information they have ever entered into Facebook. We showed the actual data Facebook makes available in this tool to the people we would expect to use it to build new competitors—engineers, product managers, and founders. Consistently, they did not feel that they could use it to create new, innovative products.

Just by looking at the sheer volume of data Facebook makes available, it’s hard to believe this is true. The Download Your Information export includes dozens of the user’s files, containing every event attended, comment posted, page liked, and ad interacted with. It also is a stark reminder of just how many features Facebook has (a fully fledged payments platform! Something called “Town Hall”!) and how many have been retired (remember pokes?). When Katie Day Good got her data from Facebook, the PDF ran to 4,612 pages.

But the people we interviewed—the ones who might actually make use of all this information—noted some serious shortcomings in the data. A user can download a comment made on a status, but not the original status or its author (at least in a way useful for developers). A user can get the start time and name of an event attended, but not the location or any fellow attendees. Users can get the time they friended people, but little else about their social graphs. Time and time again, Facebook data was insufficient to re-create almost any of the platform’s features.

From a privacy perspective, these shortcomings make sense. Facebook draws a hard line around what it considers one user’s data versus another’s in order to ensure that no one has access to information not their own. Sometimes, though, the hard line makes the data less useful to competitors. Information falls in the gaps, leaving conversations unable to be reconstructed, even if both sides upload their data. Facebook mused extensively on the privacy trade-offs involved in data portability in a white paper published in September. It concluded, more or less, that there need to be more conversations on this subject. (Mark Zuckerberg himself has given a similar line about data portability since as early as 2010.)

Conversations aside, there is some low-hanging fruit to make current data portability options more useful for competitors and easier for users. Almost no platforms we looked at gave any sense of what downloaded data might actually look like, and without this kind of documentation, developers would have a hard time incorporating this data into any real products. The process of actually downloading data could also be improved. Currently, many platforms hide their data exports deep in menus, limit how frequently users can download their data, and take a long time to make the data accessible. Spotify, for example, can take up to 30 days to create its data export.

One-user-at-a-time data portability might also be the wrong approach. On social platforms, users want to be where their friends are, and portability pioneers may find themselves on barren networks. But alternative forms of data portability might address this problem and work better for competition. For example, platforms could allow users to move their data in coordinated groups. The family WhatsApp could agree to move to Vibe all at once, or the anarcho-socialist Facebook group could put it to a vote. Similarly, open and continuous integration may be more effective than one-time data transfers. There is room for the kind of experimentation and innovation Silicon Valley is famous for.

Even with all of these improvements, data portability is in danger of being a waste of time. It has all the trappings of a radical, win-win way to increase competition on the internet, but when put into practice, it has so far fallen short. It might work for nonsocial applications, like music streaming or fitness apps, but as of now it acts as a distraction from proposals for more systemic integration, including those put forward as part of the Senate’s recent ACCESS Act. Data portability is just one narrow tool to improve competition in the tech sector—and it’s an Allen wrench, not a Swiss Army knife.

The Bust of Nefertiti is Free (With One Strange Caveat)

Nefertiti Scan

Image: Philip Pikart CC BY-SA 3.0 Unported

update: a better edited (thanks Torie!) version of this post ran in Slate two days after I posted it here. In the interest of simplicity (maybe?), I have appended the full Slate version below.

Today, after a three year legal battle, artist Cosmo Wenman released high quality scans of the Bust of Nefertiti currently residing in the Staatliche Museen in Berlin. This is the culmination of an extraordinary FOIA effort by Cosmo and he is rightly being commended for pushing the files into the public. You can download the files yourself here and I encourage you to do so.

Unfortunately, the files come with a strange and unexpected caveat - a license carved directly into the base of the file that purports to restrict their commercial use.

Nefertiti License

Image: Cosmo Wenman CC BY-NC-SA 3.0. Why can Cosmo license this image? I would argue because he added the blue lines on the side to try and suggest digitization, which is a creative act that is at least arguably protectable.

Is that restriction even enforceable? Is the museum that created the scan just trying to bluff its way into controlling the scan of the bust? I’m writing about it so you can guess that the answer is probably yes. But let’s go a bit deeper.

Background

The Bust of Nefertiti was not a random target for this effort. In 2016 a pair of artists claimed to have surreptitiously scanned the bust and released the files online. This drew attention in part because of the restrictions that the Staatliche Museen generally places on photography and other reproduction of the Bust. Shortly after the announcement many experts (including Cosmo) questioned the veracity of the story.

This skepticism was grounded in a belief that the scan itself was of a higher quality than would have been possible with the technology described by the artists. In fact, the file was of such high quality that it was likely created by the Staatliche Museen itself.

Believing this to be the case, Cosmo initiated the equivalent of a FOIA request to gain access to the Museum’s scan (the Staatliche Museen is a state-owned museum). This turned into a rather epic process that ultimately produced the files released today. One of the conditions placed by the Staatliche Museen on the released file was that it was released under a Creative Commons Non-Commercial license. On its face, this would prevent anyone from using the scan for commercial purposes.

Is the Non-Commercial Restriction Enforceable?

Creative Commons licenses are copyright licenses. That means that if you violate the terms of the license, you may be liable for copyright infringement. It also means that if the file being licensed is not protected by copyright, nothing happens if you violate the license. If there is not a copyright protecting the scan a user does not need permission from a ‘rightsholder’ to use it because that rightsholder does not exist.

As I wrote at the time of the original story, there is no reason to think that an accurate scan of a physical object in the public domain is protected by copyright in the United States (there is more about this idea in this whitepaper). Without an underlying copyright in the scan, the Staatliche Museen has no legal ability to impose restrictions on how people use it through a copyright license.

While the copyright status of 3D scans is currently more complex in the EU, Article 14 of the recently passed Copyright Directive is explicitly designed to clarify that digital versions of public domain works cannot be protected by copyright. Once implemented that rule would mean that the Staatliche Museen does not have the ability to use a copyright license to prevent commercial uses of the scan in the EU.

I have written previously about the role that licenses can play to signal intent to users even if they are not enforceable. In this case, it appears that the Staatliche Museen is attempting to signal to users that it would prefer that they not use the scan for commercial purposes.

While that is a fine preference to express in theory, I worry about it in this specific context. There are plenty of ways for the Staatliche Museen to express this preference. When a large, well lawyered institution carves legally meaningless lawyer language into the bottom of the scan of a 3,000 year old bust to suggest that some uses are illegitimate, it is getting dangerously close to committing copyfraud. The Staatliche Museen could easily write a blog post making its preferences clear without pretending to have a legal right to enforce those preferences. In light of that, this feels less like an intent to signal preferences than an attempt to scare away legitimate uses with legal language.

Bonus: Moral Rights

If you have made it this far into the post, I’ll throw one more fun twist on the pile. The Staatliche Museen has added quasi-legal language to the bust scan itself by carving text into the bottom. The file itself is digital, so it is fairly trivial to erase that language (by filling in the words, cutting off the bottom, or some other means). Could the Staatliche Museen claim that removing the attribution language violates some other right?

The most obvious place to look for a harm that the Staatliche Museen could claim is probably the concept of moral rights. Moral rights are sometimes referred to as part of the catchall of ‘related rights.’ These rights often include things like a right of attribution and a right of integrity. In the United States these rights are codified (in a very limited way) in 17 U.S.C. §106A (and are therefore often referred to as ‘106A rights’, or VARA rights after the Visual Artists Rights Act that created the section).

Could removing the attribution language violate the Staatliche Museen’s moral rights? I would argue not. While removing attribution or intentionally modifying the work to remove the fake license might create problems if the Staatliche Museen was the ‘creator of the work’ for copyright purposes, that is not the case here. The Staatliche Museen did not create any work that is recognized under US (and soon EU) copyright law. That means that there is nothing for the moral rights to attach to. That being said, I am far from an expert on moral rights (doubly so outside of the US). I’ll link to any better analysis that I see in the coming days.

Update 11/16/19: Marcus Cyron brought to my attention that, for reasons related to the technical structure of the Berlin museums, the name I was using for the museum in this piece was incorrect. I have therefore changed all of the references to the “Neues Museum” to instead refer to the “Staatliche Museen”. That change aside, the substance of the post remains the same.


The Nefertiti Bust Meets the 21st Century

When a German museum lost its fight over 3D-printing files of the 3,000-year-old artwork, it made a strange decision.

It seemed like the perfect digital heist. The Nefertiti bust, created in 1345 B.C., is the most famous work in the collection of Berlin’s Neues Museum. The museum has long prohibited visitors from taking any kinds of photographs of its biggest attraction. Nonetheless, in 2016 two trenchcoat-wearing artists managed to smuggle an entire 3D scanning rig into the room with the bust and produce a perfect digital replica, which they then shared with the world.

At least, that was their story. Shortly after their big reveal, a number of experts began to raise questions. After examining the digital file, they concluded that the quality of the scan was simply too high to have been produced by the camera-under-a-trenchcoat operation described by the artists. In fact, they concluded, the scan could only have been produced by someone with prolonged access to the Nefertiti bust itself. In other words, this wasn’t a heist. This was a leak.

One of the first experts to begin to question the story of the Nefertiti scan was the artist Cosmo Wenman. Once Wenman realized that the scan must have come from the museum itself, he set about getting his own copy and making it public. He initiated the German equivalent of a FOIA request. (The Neues Museum is state-owned.) His request kicked off a three-year legal odyssey.

The museum never quite clarified its relation to the scans. But earlier this week, Wenman released the files he received from the museum online for anyone to download. The 3D digital version is a perfect replica of the original 3,000-year-old bust, with one exception. The Neues Museum etched a copyright license into the bottom of the bust itself, claiming the authority to restrict how people might use the file. The museum was trying to pretend that it owned a copyright in the scan of a 3,000-year-old sculpture created 3,000 miles away.

The Neues Museum chose to use a Creative Commons Attribution, NonCommercial, Share-Alike license. If the museum actually owned a copyright here, the license would give you permission to use the file under three conditions: that you gave the museum attribution, did not use it for commercial purposes, and allowed other people to make use of your version. Failing to comply with those requirements would mean that you would be infringing on the museum’s copyright.

But those rules only matter if the institution imposing them actually has an enforceable copyright. If the file being licensed is not protected by copyright, nothing happens if you violate the license. If there is not a copyright protecting the scan, then you don’t need permission from a “rights holder” to use it. Because that rights holder does not exist. It would be like me standing in front of the Washington Monument and charging tourists a license fee to take its picture.

As I wrote at the time of the original story, there is no reason to think that an accurate scan of a physical object in the public domain is protected by copyright in the United States. (More about this idea in this white paper.) Without an underlying copyright in the scan, the Neues Museum has no legal ability to impose restrictions on how people use it through a copyright license.

While the copyright status of 3D scans of public domain works is currently more complex in the EU, Article 14 of the recently passed Copyright Directive is explicitly designed to clarify that digital versions of public domain works cannot be protected by copyright. Once implemented, that rule would mean that the Neues Museum does not have the ability to use a copyright license to prevent commercial uses of the scan in the EU. Now, licenses can signal intent to users even if they are not enforceable. In this case, it appears that the Neues Museum is attempting to signal that it would prefer people not use the scan for commercial purposes. While that is a fine preference to express in theory, I worry about it in this specific context. There are plenty of other ways for the Neues Museum to express this preference. When a large, well-lawyered institution carves legally meaningless lawyer language into the bottom of the scan of a 3,000-year-old bust to suggest that some uses are illegitimate, it is getting dangerously close to committing copy fraud—that is, falsely claiming that you have a copyright control over a work that is in fact in the public domain. The Neues Museum could easily write a blog post making its preferences clear without pretending to have a legal right to enforce those preferences. In light of that, this feels less like an intent to signal preferences than an attempt to scare away legitimate uses with legal language.

The scary language has real-world consequences. These 3D scans could be used by people who want to 3D-print a replica for a classroom, integrate the 3D model into an art piece, or allow people to hold the piece in a virtual reality world. While some of these users may have lawyers to help them understand what the museum’s claims really mean, the majority will see the legal language as a giant “keep out” sign and simply move on to something else.

The most important part is that adding these restrictions runs counter to the entire mission of museums. Museums do not hold our shared cultural heritage so that they can become gatekeepers. They hold our shared cultural heritage as stewards in order to make sure we have access to our collective history. Etching scary legal words in the bottom of a work in your collection in the hopes of scaring people away from engaging with it is the opposite of that.