March 20, 2020Michael Weinberg

How We Made the Open Hardware Summit All Virtual in Less Than a Week

Side by side virtual fashion comparison

Virtual conferences call for virtual fashion.

This post originally appeared on the OSHWA blog and Engelberg Center blog in slightly different but substantially identical versions.

First, thank you again to everyone - speakers, participants, and sponsors - for a fantastic 10th anniversary Open Hardware Summit. We knew the 10th anniversary Summit would be one for the ages, although we didn’t quite expect it to be because it became the first virtual Summit.

Thanks to the timing of the Summit, the 10th anniversary Summit ended up being many people’s first virtual summit of the Covid-19 era (that includes the organizers). Unfortunately it looks like it is unlikely to be the last. In the hopes of helping event organizers struggling with the same challenges, this blog post outlines the decisions we made and the steps we took to make it happen.

Quick Context

The Open Hardware Summit is an annual gathering of the open source hardware community held by the Open Source Hardware Association (OSHWA). This year the Engelberg Center partnered with OSHWA to host the event in New York City. The event usually brings together hundreds of community members and speakers from around the world. It was scheduled for March 13, 2020.

While the situation has been evolving for some time, as recently as March 5th (8 days before the Summit) we thought that holding a reduced in-person version of the event was the right decision. By March 8 (5 days before the Summit) that was no longer tenable and we announced that the Summit was going all virtual. That was the right decision, but what does going all virtual mean?

Priorities

We had two major priorities for the virtual Summit:

Online streaming video of all of the speakers and panels.
A community space for discussions and coming together.

Video

The live stream of the Summit had to be both accessible to our viewers and easy to join for our speakers and panelists. After considering some options and consulting with experts in our community (huge thank you to Phil Torrone at Adafruit for the guidance), we concluded that a combination of YouTube and StreamYard would be the best option.

YouTube worked for our community because it is easily accessible on a wide range of platforms in most of the world. That meant that just about everyone would be able to see the Summit from wherever they were.

StreamYard made it easy to manage the backend. Speakers could join a virtual green room before their talk and our technical testing the day before the Summit made it clear that it was easy for them to share their slide presentations as well. One of the members of the Summit team was able to easily add and remove people (and their screens) to the live feed, along with stills and slides for introductions, sponsors, and everything else.

Community Space

We also looked at a number of options for online discussions. We decided that a discord server would be the best option for the open source hardware community. Discord allowed us to open the space to anyone who wanted to join, while at the same time giving us moderation control over the discussion. Many community members were already comfortable with discord, which was also a bonus.

We also decided to use discord for a version of Q&A for the speakers. One option would have been to try and integrate video questions from the audience into the live stream. That would have been technically possible with StreamYard (probably…), but it seemed like an unnecessary logistical complication for the organizers. As an alternative we decided to set up separate discord channels for each of the speakers. That allowed the speakers to end their talk and move to their discord channel for further discussions.

One unexpected and welcome development was that the discord server grew into a larger community hub, with channels devoted to solutions to Covid-19, community announcements, and even hacking the conference badge. We may decide to maintain the server well beyond the Summit as a community space.

It Mostly Worked

We scheduled brief runthroughs with all of the speakers the day before the Summit. Everyone had a chance to get comfortable with the process and work out any last minute problems. On the day of the Summit we embedded the livestream in the Summit site, along with a link to the discord server for discussion. There were a few audio glitches where speakers had to briefly drop out, but all things considered it went pretty smoothly.

Once the Summit was over the entire livestream of the Summit was posted automatically to OSHWA’s YouTube channel. Within a day or two we had broken out all of the individual talks into a video playlist and pulled the audio from our panel discussion into a stand alone podcast episode.

To the extent that things worked, one of the big reasons was the nature of the OSHWA community. Besides being generally great and supportive (no small thing), the open source hardware community already sees itself as a community and is already comfortable with connecting via online tools. That made it easy for them to enthusiastically watch the live stream and jump into the online discussion. Not all types of events have this starting point, which may suggest that they are not great candidates for this type of virtual structure.

If you are reading this because you are working on your own virtual event, good luck! We are happy to answer questions if you have them. Email us at info@oshwa.org. StreamYard also has a referral program, so if you drop us a line at info@oshwa.org we can give you a $10 credit if you want it.

March 01, 2020Michael Weinberg

The Smithsonian Goes Open Access

It doesn’t get a lot bigger than this. On February 25th the Smithsonian went in big on open access. With the push of a button, 2.8 million 2D images and 3D files (3D files!) became available without copyright restriction under a CC0 public domain dedication. Perhaps just as importantly, those images came with 173 years of metadata created by the Smithsonian staff. How big a deal is this? The site saw 4 million image requests within the first six hours of going live. People want access to their cultural heritage.

While this is all very exciting, I wanted to take a moment to dive a bit deeper into what I see in the licensing portion of this announcement. While there are many important parts of this announcement - like the API to actually access it, and fully downloadable data that is already being turned into interesting visualizations - the licensing decisions are worth considering as well. The Smithsonian has helped to set a new standard for how open access can work at big institutions, although there are still a few things that could use some improvement.

I also want to reflect on how this moment is the result of many years of effort and advocacy by a wide range of people. Some relevant moments in that process are Carl Malamud’s 2007 “What Would Luther Burbank Do?” effort (original and archive) (one rule of thumb about big moments in openness is that Carl was usually there years earlier laying the groundwork), Michael Edison’s work on the Smithsonian Commons (the best links I have are here and here, although I’m happy to update if anyone has something better), the Cooper Hewitt’s decision to release its metadata under CC0 (followed by the 3D scan of the entire building and their font (that I used quite recently) to boot), and the Smithsonian’s own study on the impact of open access on galleries, libraries, museums, and archives (not surprisingly, written by Effie Kapsalis, who would go on to spearhead this open access move by the Smithsonian). The Smithsonian’s decision to start making 3D models of its collection available online (lead by Vince Rossi) also helped lay the groundwork for the inclusion of 3D in this release. While these efforts are worth mentioning for many reasons, one is as a reminder that advocacy takes a long time and is made up of many smaller steps. Big things don’t just happen.

Make it Easy for Good Actors to be Good

Some people will see an announcement like this and immediately think of all of the bad things that could be done with these objects. While I do not dispute that bad things are possible, letting (the relatively small number of) bad actors guide thinking about open access policies does a disservice to (the relatively large number of) good actors. Copyright restrictions or terms of service are unlikely to stop bad actors from doing bad things with cultural artifacts. However, they create significant barriers to good actors doing good things with them. Access regimes should be designed to empower good actors, not to try and slow down every possible fringe bad actor. That seems to be largely how the Smithsonian approached this effort.

CC0 By Default

CC0 is a public domain dedication that clarifies that the Smithsonian is not making any claim of ownership over the digital files it is releasing. The cultural objects included in this release are all in the public domain, so the use of CC0 is not intended to address copyrights attaching to the objects themselves. Instead, CC0 is a way for the Smithsonian to indicate that it does not have any additional right in the digital file as distinct from the object it represents.

This is important in both the 2D and 3D context. In the US it is fairly clear that a digital copy of a 2D work does not get its own copyright. That is also true for 3D scans in the US. The EU is taking steps in that direction as well. The legal status of 2D images of 3D objects is a bit more ambiguous, as is that of 3D models (created in CAD instead of by scanning the object) of cultural artifacts. There is also a lingering possibility that some jurisdictions could take the law in a completely different directions.

While the weight of legal and logical authority suggests to me that the vast majority of digitizations of public domain objects do not get their own copyright protection, CC0 waives away that ambiguity and comes down clearly on the side of openness. In addition to being right on the law, I believe that this decision is right on the theory. Creating an accurate reproduction of a work in the public domain should not give you a right over it.

In the 3D context I really appreciate that the Smithsonian is applying CC0 to scans and reproductions. See, for example, the scan of the Apollo 11 Hatch:

Apollo 11 Command Module Hatch by The Smithsonian Institution on Sketchfab

That file is pretty unambiguously in the public domain and released under CC0. The copyright status of the CAD model of the same hatch is slightly less clear. Nonetheless, Smithsonian decided to clear up any ambiguity by using CC0.

Not Everything is Open

2.8 million files is a lot of files, but it is far from everything in the Smithsonian’s collection. As this slide from the Smithsonian’s 3D Digitization Team makes clear, there are still many objects left to digitize:

Slide showing that 1 million objects are currently on display while 154 million objects are hidden

Some objects have not been digitized yet because they simply have not made it to the front of the queue yet. Others have been digitized but have not been included as part of the full open access program. In many cases, that is fine too.

One example is this scan of the “Project Egress” Apollo 11 hatch reproduction.

This is a scan of a replica of the Apollo 11 hatch created by Adam Savage as part of the 50th anniversary of Apollo 11. Unlike the original hatch, there is at least an argument to be made that the reproduction is protected by copyright. If the underlying object is protected by copyright the Smithsonian may not have the legal ability to release the files under CC0. So it didn’t.

It is OK to Keep Some Things Out of the Open Access Program

The more interesting example is that of the Sculpin Hat.

Sculpin Hat

The hat was a ceremonial object of the Tlingit clan of Sitka, Alaska. It was purchased in 1884, which means from a copyright standpoint it is in the public domain. The Smithsonian scanned the damaged hat in order to create a restored replica for the clan in 2019. That means that they have the scan. And, while the scan is up on the 3D portal for viewing, it is not released under a CC0 license or even downloadable.

Why not? Because there is more to an open access program than copyright considerations. As the digitization team notes, there are cultural reasons why an object might not be included:

Slide showing that some objects will not be open because of cultural or other reasons

These are complex questions without easy answers, and it is quite reasonable to want to engage in good faith dialogs about them with all of the stakeholders before releasing the digital file without restriction. The Traditional Knowledge labels project is another interesting attempt to begin to engage with these questions.

If Works are Kept Out of the Open Access Program, The Smithsonian Needs to Explain the Rules

While the Smithsonian’s instinct to hold some files back in a reasonable one, it needs to do a much better job of explaining them to the public.

The Sculpin Hat has a notice that ‘Usage Conditions Apply’

image of usage conditions

The same notice applies, somewhat unexpectedly, on the 3D scans of the gloves worn by Neil Armstrong on the Apollo 11 mission:

image of usage conditions

There are at least two problems with this state of affairs. First, the Smithsonian’s use conditions allow for “non-commercial, educational, and personal uses”. However, the files are not actually available for download on the portal. That means even uses within the Smithsonian’s rules are not possible yet.

Second, the popup notice makes it exceedingly unclear how the Smithsonian is imposing these conditions on users. Are these restrictions based in copyright law? If so, and there is no copyright in either the scanned object or the scan file, does that mean that these restrictions are not legally enforceable?

Alternatively, the restrictions may be based in the Smithsonian’s Terms of Use. Assuming the Smithsonian structured the download in a way that required users to agree to those Terms, those Terms could be considered a contract between the Smithsonian and the downloader that governs the use of the files. Basically, the Smithsonian could say that as a condition of accessing the files a downloader has to agree to their terms - that would allow the Smithsonina to impose rules without relying on copyright law. However, as currently written, the Terms of Use also seem to frame the Smithsonian’s control over the files as a copyright issue, not an access issue. The usage conditions section of the terms reads in part:

All other Content is subject to usage conditions due to copyright and/or other restrictions and may only be used for personal, educational, and other non-commercial uses consistent with the principles of fair use under Section 108 of the U.S. Copyright Act. All rights not expressly granted herein by the Smithsonian are reserved…

It is fine for the Smithsonian to reserve rights that exist. But framing the use restriction in the context of copyrights that do not exist is exceedingly confusing, if not legally invalid.

As discussed earlier, the Smithsonian may have valid reasons to want to limit access to some digital files. That being said, it also has an obligation to create and describe those limitations in a legally coherent way.

As I said at the outset, this is an exciting time for open access. The Smithsonian’s decision to release a large number of objects and to include 3D objects should help set the standard for open access going forward. While this effort - like all open access efforts - is a work in progress (I can’t help but notice that the Presidential Portraits collection is missing at least one portrait that we know exists, and I know of a few more works that people want to get in the 3D scan queue), it is largely being done with intentionality and thoughtfulness.

While I know that there were many, many people involved in this effort at the Smithsonian, I want to say a special thank you to Effie Kapsalis and Vince Rossi for the crazy amount of work and persistence they put into making this happen. I’m also heartened that my Engelberg Center colleague Neal Stimler was involved in making all of this happen. When an institution as big as the Smithsonian does something like this it makes a huge splash, but that does not mean getting it to happen is easy.

And one last thing - if you want to start imagining what you can do with all of this new culture at your fingertips, there’s a whole page of examples of things that talented artists have done so far.* You could even start with this book.

*I could write a whole other blog post about how important it is to go beyond releasing objects in an open access program and actually model use of those objects by recruiting creators. And maybe I will. But not today. This post is already way too long.

Feature image: Copying in the Louvre by Alfred Henry Maurer

February 15, 2020Michael Weinberg

How Explaining Copyright Broke the YouTube Copyright System

This post originally appeared on the Engelberg Center blog.

This is a story about how the most sophisticated copyright filter in the world prevented us from explaining copyright law. It doesn’t involve TikTok dance moves or nuanced 90s remixes featuring AOC. No, it involves a debate at a law school conference over how and when one song can infringe the copyright of another and how exactly one proves in a courtroom if the accused song is “substantially similar” enough to be deemed illegal. In the end, because it was blocked by one of the music companies who owns the song, it also became a textbook study in how fair use still suffers online and what it takes to pushback when a video is flagged. A copyright riddle wrapped up in an algorithmic enigma, symbolic of the many current content moderation dilemmas faced by online platforms today.

If you want to watch the video it is available here. If you prefer to listen to it you can subscribe to the Engelberg Center Live Events podcast here. And if you are curious about how new European laws about copyright filtering may impact this sort of situation in the future both inside and outside of Europe, you might be interested in our upcoming conference examining online copyright liability A New Global Copyright Order? on April 20, 2020. You can find out more information about that conference here.

The Video

The video in question was a recording of the “Proving Similarity” panel, which was part of the Engelberg Center’s Proving IP symposium in May of 2019. The panel, which was moderated by Professor Joseph Fishman, featured presentations and discussions by Judith Finell and Sandy Wilbur. Ms. Finell and Ms. Wilbur were the musicologist experts for the opposing parties in the high profile Blurred Lines copyright infringement case. In that case the estate of Marvin Gaye accused Robin Thicke and Pharrell Williams of infringing on Gaye’s song “Got to Give it Up” when they wrote the hit song “Blurred Lines.”

The primary purpose of the panel was to have these two musical experts explain to the largely legal audience how they analyze and explain songs in copyright litigation. The panel opened with each expert giving a presentation about how they approach song analysis. These presentations included short clips of songs, both in their popular recorded version and versions stripped down to focus on specific musical elements.

The Takedown

screenshot from the YouTube copyright summary and status page listing the songs included in the video

The video used clips of the songs in question to illustrate specific points about how they were analyzed in the context of copyright infringement litigation. As such, we were confident that our use of the songs were covered by fair use and disputed the claims using YouTube’s internal system.

Shortly thereafter we received notice that the rightsholder was rejecting our dispute on multiple songs.

screenshot from email alerting us that UMG has decided that the claim is still valid

The Decision and the Question

Still confident that our uses were covered by fair use, we researched the YouTube counternotification process. We discovered that if we continued to challenge the accusation of infringement and lost, our video would be subject to copyright strikes. If the account was subject to multiple strikes its ability to live stream could be restricted or the account could be terminated. While our colleagues in the communications department were highly supportive of our efforts, they were concerned that one misstep could wipe NYU Law’s entire YouTube presence off the internet.

In deciding how we could continue to press our case, one question was unclear. Our single video was subject to multiple copyright infringement claims. If we failed to prevail, would that mean that the account was subject to one copyright strike because all of the claims were against a single video, or multiple strikes tied to each claim against the single video? As there were four remaining claims against our video, and three claims could result in the termination of the account, the distinction was highly relevant to us.

screenshot from the appeal dispute page

Unfortunately, we still do not know the answer to that question. This page seems like the closest to one having an answer, but it does not provide one to our specific question. We tried using the ‘Was this helpful?’ link at the bottom to get additional information, but YouTube did not respond.

The Resolution

This would have been a dead end for most users. Unable to understand how the already opaque dispute resolution process might impact the status of their account, they would have to decide if it was worth gambling their entire YouTube account on the chances that their some combination of YouTube and the rightsholder would recognize their fair use claim.

Since we are the center at NYU Law focused on technology and innovation, it was not a dead end for us. We reached out to YouTube through private channels to try to get clarity around the copyright strike rules. While we never got that clarity, some weeks later we were informed that the claims against our video had been removed.

The Takeaway

What lessons can be learned from this process?

First, it highlights how challenging it can be for users with strong counter-arguments to dispute an allegation of infringement by large rightsholders. The Engelberg Center is home to some of the top technology and intellectual property scholars in the world, as well as people who have actually operated the notice and takedown processes for large online platforms. We had legal confidence in our position that would cost an average user tens of thousands of dollars (if not more) to obtain. Even all of those advantages were not enough to allow us to effectively resolve this dispute. Instead, we had to also rely on our personal networks to trigger a process - one that is still unclear - that resulted in the accusations being removed. This is not a reasonable expectation to place on average users.

Second, it highlights the imperfect nature of automated content screening and the importance of process when automation goes wrong. A system that assumes any match to an existing work is infringement needs a robust process to deal with the situations where that is not the case. Our original counterclaim included a clear explanation of the nature of the video and the reasons for using the clips. It is hard to imagine someone with any familiarity with copyright law watching the video, reviewing our claim, and then summarily rejecting it. Nonetheless, that is what happened. No matter how much automation allows you to scale, the system will still require informed and fair human review at some point.

Third, it highlights the costs of things going wrong. The YouTube copyright enforcement system is likely the most expensive and sophisticated copyright enforcement system ever created. If even this system has these types of flaws, it is likely that the systems set up by smaller sites will be even less perfect.

Nonetheless, we are happy that the video has been restored. You can watch it - along with all of the other videos from Proving IP - on the NYU Law YouTube channel. You can also listen to the audio from it and all of the Engelberg Center’s events by subscribing to our live events podcast.

Finally, Europe has recently passed legislation designed to oblige more websites to implement automated copyright filters. Our event A New Global Copyright Order? on April 20 will examine how that legislation will impact Europe and the worldwide conversation around copyright law. We hope to see you there.

January 01, 2020Michael Weinberg

Easy Public Domain Picture Frame with the Cleveland Museum of Art Open Access API

In celebration of Public Domain Day 2020 I decided to try to turn the old monitor in my office into a picture frame to display a rotating collection of public domain works. The Cleveland Museum of Art (CMA) launched a robust Open Access program in 2019, so I decided to use their API to power it. This blog post explains all of the steps in creating the project so you can make one too.

This is a fairly lightweight project, so all you need to make it happen is:

A monitor
A raspberry pi (or any other computer)
Some code

Most of this post is about the code. The theory behind this project is that there is a website that regularly pulls a new image from the CMA’s API and displays it along with some information like the work’s title and creator. The raspberry pi boots into a fullscreen browser displaying that page. The screen also needs to automatically turn off at night because it is a waste to keep the monitor on all night when there is no one around to see it.

The entire project is a double celebration of openness. In addition to the works being displayed, the only reason I could even begin to build it is that the open nature of the internet’s architecture allows me to peek at better-designed sites to learn from them. Open educational resources like the Coding Train have taught me just enough javascript to be able to put something like this together.

The Site

I decided (guessed?) that the easiest way to make all of this work was to create a website that displayed the rotating set of pictures. I’m bad at javascript, so this gave me a chance to learn a little bit more about it.

The self-contained site is available in this repo. If you don’t care about how it works, you can just access a live version of it here.

index.html

This file is minimal and straightforward - it is essentially just a container with pointers to a stylesheet and the script. The one thing to note is that the script is inside of a container div:

        <div class='container'>

         <script src='script.js'></script>

        </div>

This allows me to overlay the text descriptions on top of the image.

script.js

This file is the heart of the action. I will walk through each section to explain what it does. All of the console.log lines are just for my own troubleshooting and can basically be ignored.

//function to generate a random number
function getRndInteger(min, max) {
  return Math.floor(Math.random() * (max - min) ) + min;
}

This initial function is used to generate a random number. The random number is needed in two places: first to pick the image from the collection, and second to determine how long the image will stay up before the page refreshes.

//uses the function to pick a random image in the collection
var offset = getRndInteger(1, 31278);
//inserts that random number into the request url, returning a json file
var target_json_url = "https://openaccess-api.clevelandart.org/api/artworks/?limit=10&indent=1&cc0=1&has_image=1&skip=" + offset;

This block of code is used to access the entry via the CMA’s API. I believe that there are 31,277 entries in the CMA’s open access catalog that have an image. The first line picks a random number between 1 and 31,277. The second line uses the API’s syntax to jump to the work that corresponds to that number.

The limit=10&indent=1 elements in the URL are probably unnecessary. The cc0=1&has_image=1 elements are important - they limit results to ones that have a CC0 license and have an image associated with the entry. Those are the open access entries that I care about.

//create new request object instance
let request = new XMLHttpRequest();
//opens the file
request.open('GET', target_json_url);
request.responseType = 'json'
request.send();

This block of text creates an object to hold the json file that the API returns at the URL and then opens the json file into it. Basically it creates and fills the container for the JSON file that corresponds to the object that we randomly selected above.

request.onload = function() {
    const response_json = request.response;
    //gets the image URL + tombstone of a random image from the collection and turns it into an array assigned to a variable
    var found_image_info = grabImageInfo(response_json);

    var picked_image_URL = found_image_info[0];
    var picked_image_tombstone = found_image_info[1];
    var picked_image_title = found_image_info[2];
    var picked_image_author = found_image_info[3];
    var picked_image_date = found_image_info[4];

    //creates the image to be  posted
    var img = document.createElement("img");
    img.src = picked_image_URL;

    img.alt = 'picked_image_tombstone';

    //creates the text
    var tomb_text = document.createTextNode(picked_image_tombstone)

    //creates the linebreak
    var linebreak = document.createElement('br');

    let item = document.createElement('div');
    item.classList.add('item');
    item.innerHTML = `<div class="container"><img class="beach-image"  src="${picked_image_URL}" alt="beach image"/><div class="textStyle">${picked_image_title}<br>${picked_image_author}<br>${picked_image_date}</div></div>`;
    document.body.appendChild(item);

    //set up the refresh
    //time is in ms
    //this sets the range
    var refresh_interval = getRndInteger(5000, 20000)
    console.log("refresh rate = " + refresh_interval);
    //this uses the range to reset the page
    setTimeout(function(){
        location = ''
    },refresh_interval)
}

This block is where most of the work happens, so I’ll break it down in smaller pieces. The reason it is all tucked into a request.onload function is that the code in this block waits to load until it has successfully loaded the data from the API in the background.

    const response_json = request.response;
    //gets the image URL + tombstone of a random image from the collection and turns it into an array assigned to a variable
    var found_image_info = grabImageInfo(response_json);

This first section assigns the contents of the JSON file to a variable and then sends the JSON file to the grabimageInfo function described below. That function pulls all of the data I care about out of the JSON file and puts it in an array that can be accessed with bracket notation (see next block).

    var picked_image_URL = found_image_info[0];
    var picked_image_tombstone = found_image_info[1];
    var picked_image_title = found_image_info[2];
    var picked_image_author = found_image_info[3];
    var picked_image_date = found_image_info[4];

This section assigns a variable to each element in the found_image_info array.

    //creates the image to be  posted
    var img = document.createElement("img");
    img.src = picked_image_URL;

    img.alt = 'picked_image_tombstone';

This section creates an image element. The source is the URL that comes from the JSON file and the alt text is the tombstone text from the JSON file.

    let item = document.createElement('div');
    item.classList.add('item');
    item.innerHTML = `<div class="container"><img class="beach-image"  src="${picked_image_URL}" alt="beach image"/><div class="textStyle">${picked_image_title}<br>${picked_image_author}<br>${picked_image_date}</div></div>`;
    document.body.appendChild(item);

This section creates the HTML to be added to the index.html file. The item.innerHTML section creates an HTML payload with the image and the title, author, and date overlayed on top of it. If you want to change what is displayed over the image this is where you should start messing around.

    //set up the refresh
    //time is in ms
    //this sets the range
    var refresh_interval = getRndInteger(5000, 20000)
    console.log("refresh rate = " + refresh_interval);
    //this uses the range to reset the page
    setTimeout(function(){
        location = ''
    },refresh_interval)

This is the section that sets up the page refresh. The arguments you pass to the getRndInteger variable determines the bounds of the refresh rate. Remember that the numbers are in ms. I decided to make this slightly random instead of a fixed number to add a bit of variability to the display.

function grabImageInfo(jsonObj) {

    //pulls the elements of each piece and assigns it to a variable
    var data_url = jsonObj['data'][0]['images']['web']['url']
    var data_tombstone = jsonObj['data'][0]['tombstone']
    console.log(data_tombstone)
    var data_title = jsonObj['data'][0]['title']
    //the author info sometimes doesn't exist, which screws up the function. Pulling this part out of the function fixes it because the jsonObj is not evaluated before the try/catch. I am not sure what that means but it works.
    try {
         data_author = jsonObj['data'][0]['creators'][0]['description']
     }
     catch (e) {
         data_author = ''

     }
    var data_creation_date = jsonObj['data'][0]['creation_date']

    console.log("url = " +data_url)

    //creates an array with the URL, tombstone, title, author, and creation date of the random object picked
    var function_image_data = [data_url, data_tombstone, data_title, data_author, data_creation_date]
    //returns that array
    return function_image_data;
}

This is the function to extract data from the JSON file. It pulls each relevant element and then adds it to an array. Each of the var data_url = jsonObj['data'][0]['images']['web']['url'] requests are essentially the same, with the difference being where in the JSON file they are looking for the relevant data.

try {
     data_author = jsonObj['data'][0]['creators'][0]['description']
 }
 catch (e) {
     data_author = ''

 }

The author variable works slightly differently. Sometimes the author data does not exist in the records. This structure allows the script to handle errors without crashing.

var function_image_data = [data_url, data_tombstone, data_title, data_author, data_creation_date]
//returns that array
return function_image_data;

Finally, each element of the data is put into an array and returned out of the function. The order of how the data is added to the array is arbitrary, but it is consistent so if you move something around here make sure to change how you pull them out at the top of the script.

style.css

This is also a fairly strightforward css file. The .textStyle section is what you use to style the text. I also believe that the .container section needs to be set to relative in order for the overlay to work.

The most interesting part of the file is probably the @font-face section. That loads the custom font. The font is the fantastic font that the Cooper Hewitt made available as part of their open access project a few years ago. I always like using the font for open access-related projects. The fonts live in the /data folder. They are applied to all of the text in the * section.

The Pi

Once you have everything up and running you can access it from any browser. You can try it here, press F11, and just let it happen in full screen.

If you want to run it constantly as a picture frame it makes sense to devote a computer to the task. A Raspberry Pi is a prefect candidate because it is inexpensive and draws a relatively small amount of electricity.

You could set things up so the pi hosts the file locally and then just opens it. I decided not to do that, mostly because that would involve automatically starting a local server on the pi, which was one more thing to set up. Since the service needs to be online to hit the API anyway, I thought it would be easier to just set up the page on my own domain. I have no idea if that is actually easier.

There are two and a half things you need to do in order to set the pi to automatically boot into displaying the site in fullscreen mode as a full time appliance.

Start in Fullscreen Mode

You can start Chromium in fullscreen mode from the command line. That means you can add the line to the pi’s autostart file. Assuming your username is just ‘pi’ (the default when you start raspbian), open a terminal window and type:

nano /home/pi/.config/lxsession/LXDE-pi/autostart

This will allow you to edit the autostart file directly. Add this line to the file (which is probably otherwise blank):

@chromium-browser --start-fullscreen michaelweinberg.org/cma_pd

You can change the final URL to whatever you like. If you are hosting your own version of this page, that is where to make the switch.

You may find that your fullscreen display still gets a scroll bar on one side. If that’s the case, the half thing you need to do is open chromium and type chrome://flags in the toolbar. Once you are looking at the flags, search for overlay scrollbars and enable it. That will hide the scroll bars.

Turn off the Screen

The final thing you might want to do is turn off the screen of the display at night. In order to do this you need to make two entries in cron. Here is a nice intro to cron. Cron is a linux utility that allows you to schedule commands.

The commands you end up scheduling may vary based on your particular setup. This is a helpful tutorial laying out options to make this happen. The ones that worked for me were the vcgencmd ones.

In order to schedule those I opened a terminal window and typed crontab -e. I then added two lines. This line turned off the display: vcgencmd display_power 0 and this line turned it back on: vcgencmd display_power 1. Use crontab to schedule these at appropriate times.

That’s that. This will let you set up a rotating set of public domain images on any display you might have access to. Good luck with your own version.

List image: The Biglin Brothers Turning the Stake, Thomas Eakins, 1873

November 14, 2019Michael Weinberg

Silicon Valley’s Favorite Idea for Encouraging Competition - Data portability sounds promising—but it might not be the regulatory golden goose the private and public sectors hope it is

This post originally appeared in Slate and was co-authored with Gabriel Nicholas

In the tech policy world, antitrust is on everyone’s minds, and breaking up Big Tech is on everyone’s lips. For those looking for another way to fix tech’s competition problem, one idea keeps popping up. Mark Zuckerberg named it as one of his “Four Ideas to Regulate the Internet.” Rep. David Cicilline, a Democrat from Rhode Island and chairman of the House Judiciary Committee’s antitrust subcommittee, said it could “give power back to Americans.” It’s already enshrined as a right in the European Union as part of the General Data Protection Regulation, and in California’s new Consumer Privacy Act as well.

The idea is data portability: the concept that users should be able to download their information from one platform and upload it to another. That way, the theory goes, people can more easily try new products, and startups can jump-start their products with existing user data. The family group chat can move off of WhatsApp without leaving behind years of data. Members of the anarcho-socialist Facebook group can bring their conversations with them and take their Marxist memes with them. A whole new world can flourish off of years of built-up data. It’s competition without the regulatory and technological headache of breaking up companies.

But data portability might not be the regulatory golden goose the private and public sectors hope it is. It’s not even a new idea: Facebook has allowed users to export their data through a “Download Your Information” tool since 2010. Google Takeout has been around since 2011. Most major tech companies introduced some form of data portability in 2018 to comply with GDPR. Yet no major competitors have been built from these offerings. We sought to find out why.

To do this, we focused our research on Facebook’s Download Your Information tool, which allows users to download all of the information they have ever entered into Facebook. We showed the actual data Facebook makes available in this tool to the people we would expect to use it to build new competitors—engineers, product managers, and founders. Consistently, they did not feel that they could use it to create new, innovative products.

Just by looking at the sheer volume of data Facebook makes available, it’s hard to believe this is true. The Download Your Information export includes dozens of the user’s files, containing every event attended, comment posted, page liked, and ad interacted with. It also is a stark reminder of just how many features Facebook has (a fully fledged payments platform! Something called “Town Hall”!) and how many have been retired (remember pokes?). When Katie Day Good got her data from Facebook, the PDF ran to 4,612 pages.

But the people we interviewed—the ones who might actually make use of all this information—noted some serious shortcomings in the data. A user can download a comment made on a status, but not the original status or its author (at least in a way useful for developers). A user can get the start time and name of an event attended, but not the location or any fellow attendees. Users can get the time they friended people, but little else about their social graphs. Time and time again, Facebook data was insufficient to re-create almost any of the platform’s features.

From a privacy perspective, these shortcomings make sense. Facebook draws a hard line around what it considers one user’s data versus another’s in order to ensure that no one has access to information not their own. Sometimes, though, the hard line makes the data less useful to competitors. Information falls in the gaps, leaving conversations unable to be reconstructed, even if both sides upload their data. Facebook mused extensively on the privacy trade-offs involved in data portability in a white paper published in September. It concluded, more or less, that there need to be more conversations on this subject. (Mark Zuckerberg himself has given a similar line about data portability since as early as 2010.)

Conversations aside, there is some low-hanging fruit to make current data portability options more useful for competitors and easier for users. Almost no platforms we looked at gave any sense of what downloaded data might actually look like, and without this kind of documentation, developers would have a hard time incorporating this data into any real products. The process of actually downloading data could also be improved. Currently, many platforms hide their data exports deep in menus, limit how frequently users can download their data, and take a long time to make the data accessible. Spotify, for example, can take up to 30 days to create its data export.

One-user-at-a-time data portability might also be the wrong approach. On social platforms, users want to be where their friends are, and portability pioneers may find themselves on barren networks. But alternative forms of data portability might address this problem and work better for competition. For example, platforms could allow users to move their data in coordinated groups. The family WhatsApp could agree to move to Vibe all at once, or the anarcho-socialist Facebook group could put it to a vote. Similarly, open and continuous integration may be more effective than one-time data transfers. There is room for the kind of experimentation and innovation Silicon Valley is famous for.

Even with all of these improvements, data portability is in danger of being a waste of time. It has all the trappings of a radical, win-win way to increase competition on the internet, but when put into practice, it has so far fallen short. It might work for nonsocial applications, like music streaming or fitness apps, but as of now it acts as a distraction from proposals for more systemic integration, including those put forward as part of the Senate’s recent ACCESS Act. Data portability is just one narrow tool to improve competition in the tech sector—and it’s an Allen wrench, not a Swiss Army knife.

Michael Weinberg

I put things here so they are on the internet