October 24, 2022Michael Weinberg

I’m Not Sure That (If?) GitHub Copilot is a Problem

Last week a new Github Copilot investigation website created by Matthew Butterick brought the conversation about GitHub’s Copilot project back to the front of mind for many people, myself included. Copilot, a tool trained on public code that is designed to auto-suggest code to programmers, has been greeted by excitement, curiosity, skepticism, and concern since it was announced.

The Github Copilot investigation site’s arguments build on previous work by Butterick, as well as thoughtful analysis by Bradley M. Kuhn at the Software Freedom Conservancy. I find the arguments contained in these pieces convincing in some places and not as convincing in others, so I’m writing this post in the hopes that it helps me begin to sort it all out.

At this point, Copilot strikes me as a tool that replaces googling for stack overflow answers. That seems like something that could be useful. It also seems plausible that training such a tool on open public software repositories (including open source repositories) could be allowed under US copyright law. That may change if or when Copilot evolves, which makes this discussion a fruitful one to be having right now.

Both Butterick and Kuhn combine legal and social/cultural arguments in their pieces. This blog post starts with the social/cultural arguments because they are more interesting right now, and may impact the legal analysis as facts evolve in the future. Butterick and Kuhn make related arguments, so I’ll do my best to be clear which specific version of a point I’m engaging with at any given time. As will probably become clear, I generally find Kuhn’s approach and framing more insightful (which isn’t to say that Butterick’s lacks insight!).

What is Copilot, Really?

A large part of this discussion seems to turn on the best way to think about and analogize what Copilot is doing (the actual Copilot page does a pretty good job of illustrating how one might use it).

Butterick seems to think that the correct way to think about Copilot is as a search engine that points users to a specific part of a specific (often open source) software package. In his words, it is “a convenient alternative interface to a large corpus of open-source code”. He worries that this “selfish interface to open-source software” is built around “just give me what I want!” (emphasis his).

The selfish approach may deliver users to what they think they want, but in doing so hides the community that exists around the software and removes critical information that the code is licensed under an open source license that comes with obligations. If I understand the argument correctly, over time this act of hiding the community will drain open source software of its vitality. That makes Copilot a threat to open source software as a sustainable concept.

But…

The concern about hiding open source software’s community resonates with me. At the same time, Butterick’s starting point strikes me as off, at least in terms of how I search for answers to coding questions.

This is probably a good place to pause and note that I am a Very Bad coder who, nonetheless, does create some code that tends to be openly licensed and is just about always built on other open source code. However, I have nowhere near the skills required to make a meaningful technical contribution to someone else’s code.

Today, my “convenient alternative interface” to finding answers when I need to solve coding problems is google. When I run into a coding problem, I either describe what I am trying to do or just paste the error message I’m getting into google. If I’m lucky, google will then point me to stack overflow, or a blog post, or documentation pages, or something similar. I don’t think that I have ever answered a coding question by ending up in a specific portion of open source code in a public repo. If I did, it seems unlikely that code - even if it had great comments - would get me where I was going on its own because I would not have the context required to quickly understand that it answered my question..

This distinction between “take me to part of open source code” (Butterick’s view) and “help me do this one thing” (my view) is important because when I look at the Copilot website, it feels like Copilot is currently marketed as a potentially useful stack overflow commenter, not someone with an encyclopedic knowledge of where that problem was solved in other open source code. Butterick experimented with Copilot in June and described the output as “This is the code I would expect from a talented 12-year-old who learned about JavaScript yesterday and prime numbers today.” That’s right at my level!

If you ask Copilot a question like “how can I parse this list and return a different kind of list?,” in most cases (but, as Butterick points out, not all!) it seems to respond with an answer synthesized from many different public code repositories instead of just pointing to a single “best answer” repo. That makes Copilot more of a stack overflow explorer than a public code explorer, albeit one that is itself trained by exploring public code. That feels like it reduces the type of harm that Butterick describes.

Use at Your Own Risk

Butterick and Kuhn also raise concerns about the fact that Copilot does not make any guarantees about the quality of code it suggests. Although this is a reasonable concern to have, it does not strike me as particularly unique to Copilot. Expecting Copilot to provide license-cleared and working code every time is benchmarking it against an unrealistic status quo.

While useful, the code snippets I find in stack overflow/blog post/whatever are rarely properly licensed and are always “use at your own risk” (to the extent that they even work). Butterick and Kuhn’s concerns in this area feel equally applicable to most of my stack overflow/blog post answers. Copilot’s documentation if fairly explicit about the value of the code it suggests (“We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn’t write yourself.”), for whatever that is worth.

Will Copilot Create One Less Reason to Interact Directly with Open Source Code?

In Butterick’s view, another downside of this “just give me what I want” service is that it reduces the number of situations where someone might knowingly interact with open source code directly. How often do most users interact directly with open source code? As noted above, I interact with a lot of other people’s open source software as an extremely grateful user and importer of libraries, but not as a contributor. So Copilot would shift my direct deep interaction with open source code from zero to zero.

Am I an outlier? Nadia Asparouhouva (née Eghbal)’s excellent book Working in Public provides insight into open source software grounded in user behavior on Github. In it, she tracks how most users of open source software are not part of the software’s active developer community:

“This distribution - where one or a few developers do most of the work, followed by a long tail of casual contributors, and many more passive users - is now the norm, not the exception, in open source.”

She also suggests that there may be too much community around some open source software projects, which is interesting to consider in light of Butterick’s concern about community depletion:

”The problem facing maintainers today is not how to get more contributors but how to manage a high volume of frequent, low-touch interactions. These developers aren’t building communities; they’re directing air traffic.”

That suggests that I am not necessarily an outlier. But maybe users like me don’t really matter in the grand scheme of open source software development. If Butterick is correct about Copilot’s impact on more active open source software developers, that could be a big problem.

Furthermore, even if users like me are representative today, and Copilot is not currently good enough to pull people away from interacting with open source code, might it be in the future?

“Maybe?” feels like the only reasonable answer to that question. As Kuhn points out, “AI is usually slow-moving, and produces incremental change far more often than it produces radical change.” Kuhn rightly argues that slow-moving change is not a reason to ignore a possible future threat. At the same time, it does present the possibility that a much better Copilot might itself be operating in an environment that has been subject to other radical changes. These changes might enhance or reduce that future Copilot’s negative impacts.

Where does that leave us? The kind of casual interaction with open source code that Butterick is concerned about may happen less than one might expect. At the same time, today’s Copilot does not feel like a replacement for someone who wants to take a deeper dive into a specific piece of open source software. A different version of Copilot might, but it is hard to imagine the other things that might be different in the event that version existed. Today’s version of Copilot does not feel like it quite manifests the threat described by Butterick.

Copilot is Trained on Open Source, Not Trained on Open Source

For some reason, I went into this research thinking that Copilot had explicitly been trained on open source software. That’s not quite right. Copilot was trained on public Github repositories. Those include many repositories of open source software. They also include many repositories of code that is just public, with no license, or a non-open license, or something else. So Copilot was trained on open source software in the sense that its training data includes a great deal of open source software. It was not trained on open source software in the sense that its training data only consists of open source software, or that its developers specifically sought out open source software as training data.

This distinction also happens to highlight an evolving trend in the open source world, where creators conflate public code with openly licensed code. As Asparouhouva notes:

”But the GitHub generation of open source developers doesn’t see it that way, because they prioritize convenience over freedom (unlike free software advocates) or openness (unlike everly open source advocates). Members of this generation aren’t aware of, nor do they really care about, the distinction between free and open source software. Neither are they fired up about evangelizing the idea of open source itself. They just publish their code on GitHub because, as with any other form of online content today, sharing is the default.”

As a lawyer who works with open source, I think the distinction between “openly/freely licensed” and “public” matters a lot. However, it may not be particularly important to people using publicly available software (regardless of the license) to get deeper into coding. While this may be a problem that is exacerbated by Copilot, I don’t know that Copilot fundamentally alters the underlying dynamics that feed it.

Is This Legal?

As noted at the top, and attested to by the body of this post so far, this post starts with the cultural and social critiques of Copilot because that is a richer area for exploration at this stage in the game. Nonetheless, the critiques are - quite reasonably - grounded in legal concerns.

Fair Use

The legal concerns are mostly about copyright and fair use. Normally, in order to make copies of software, you need permission from the creator. Open source software licenses grant those permissions in return for complying with specific obligations, like crediting the original creator.

However, if the copy being made of the software is protected by fair use, the copier does not need permission from the creator and can ignore any obligations in a license. In this case, Github is not complying with any open source licensing requirements because it believes that its copies are protected by fair use. Since it does not need permission, it does not need to copy with license requirements (although sometimes there are good reasons to comply with the social intent of licenses even if they are not legally binding…). It has said as much, although it (and its parent company Microsoft) has declined to elaborate further.

I read Butterick as implying that Github and Microsoft’s silence on the details of its fair use claim means that the claim itself is weak: “Why couldn’t Microsoft produce any legal authority for its position? Because [Kuhn and the Software Freedom Conservancy] is correct: there isn’t any.”

I don’t think that characterization is fair. Even if they believe that their claim is strong, Github cannot assume that it is so strong as to avoid litigation over the issue (see, e.g. the existence of the Github Copilot investigation website itself). They have every reason to avoid pre-litigating the fair use issue via blog post and press release, keeping their powder dry until real litigation.

Kuhn has a more nuanced (and correct, as far as I’m concerned) take on how to interpret the questions: “In fact, these areas are so substantially novel that almost every issue has no definitive answers”. While it is totally reasonable to push back on any claims that the law around this question is settled in Github’s favor (Kuhn, again, “We should simply ignore GitHub’s risible claim that the “fair use question” on machine learning is settled.”), that is very different than suggesting that it is settled against Github.

How will this all shake out? It’s hard to say. Google scanned all the books in order to create search and analytics tools, claiming that their copies were protected by fair use. They were sued by The Authors Guild in the Second Circuit. Google won that case. Is scanning books to create search and analytics tools the same as scanning code to create AI-powered autocomplete? In some ways yes? In other ways no?

Google also won a case before the Supreme Court where they relied on fair use to copy API calls. But TVEyes lost a case where they attempted to rely on fair use in recording all television broadcasts in order to make it easy to find and provide clips. And the Supreme Court is currently considering a case involving Warhold paintings of Prince that could change fair use in unexpected ways. As Kuhn noted, we’re in a place of novel questions with no definitive answers.

What About the ToS?

As Franklin Graves pointed out, it’s also possible that Github’s Terms of Service allow it to use anything in any repo to build Copilot without worrying about addition copyright permissions. If that’s the case, they won’t even need to get to the fair use part of the argument. Of course, there are probably good reasons that Github is not working hard to publicize the fact that their ToS might give them lots of room when it comes to making use of user uploads to the site.

Where Does That Leave Things?

To start with, I think it is responsible for advocates to get out ahead of things like this. As Kuhn points out:

”As such, we should not overestimate the likelihood that these new systems will both accelerate proprietary software development, while we simultaneously fail to prevent copylefted software from enabling that activity. The former may not come to pass, so we should not unduly fret about the latter, lest we misdirect resources. In short, AI is usually slow-moving, and produces incremental change far more often than it produces radical change. The problem is thus not imminent nor the damage irreversible. However, we must respond deliberately with all due celerity — and begin that work immediately.”

At the same time, I’m not convinced that Copilot is a problem. Is it possible that a future version of Copilot would starve open source software of its community, or allow people to effectively rebuild open source code outside of the scope of the original license? It is, but it seems like that version of Copilot would be meaningfully different from the current version in ways that feel hard to anticipate. Today’s Copilot feels more like a fast lane to possibly-useful stack overflow answers than an index that can provide unattributed snippets of all open source software.

As it is, the acute threat Copilot presents to open source software today feels relatively modest. And the benefits could be real. There are uses of today’s Copilot that could make it easier for more people to get into coding - even open source coding. Sometimes the answer of a talented 12 year old is exactly what you need to get over the hump.

Of course, Github can be right about fair use AND Copilot can be useful AND it would still be quite reasonable to conclude that you want to pull your code from Github. That’s true even if, as Butterick points out, Github being right about fair use means that code anywhere on the internet could be included in future versions of Copilot.

I’m glad that the Software Freedom Conservancy is getting out ahead of this and taking the time to be thoughtful about what it means. I’m also curious to see if Butterick ends up challenging things in a way that directly tests the fair use questions.

Finally, this entire discussion may also end up being a good example of why copyright is not the best tool to use against concerns about ML dataset building. Looking to copyright for solutions has the potential to stretch copyright law in strange directions, cause unexpected side effects, and misaddressing the thing you really care about. That is something that I am always wary of, and a pior that informs my analysis here. Of course, Amanda Levandowski makes precisely the opposite argument in her article Resisting Face Surveillance with Copyright Law.

Image: Ancient Rome from the Met’s open access collection.

June 11, 2022Michael Weinberg

Lincoln Hand Shifter Knob

gif of Lincoln hand shifter as installed

In the interest of celebrating the weirdness of open data, I want to share a quick project that exists because of open data: Abraham Lincoln’s left hand as the shifter knob of a 1995 Mazda truck.

The whole thing was pretty strightforward. In fact, the hardest part was probably finding the right shifter knob adapter for the truck. All that was required was:

Download the Lincoln hand scans from the Smithsonian open access site.
Use tinkercad to put a hole in the back of the hand.
3D print the hand and use epoxy to attach the adapter.

spinning gif of combined version

Install it.

image of Lincoln hand shifter as installed

April 25, 2022Michael Weinberg

Printables, Honda, Platforms, and Nastygrams

Last week the 3D printing platform Printables removed an unknown number of models from their platform. This action was apparently in response to a letter Printables received from Honda claiming that user models infringed on various rights. Based on the discussion of the action in the Printables forum, it appears that at least some of Honda’s claims may have been related to the use of Honda’s trademarks in either model geometry or model descriptions.

Many people have criticized Honda’s decision to send this letter in the first place. For example, while I have some quibbles with the legal details in this Hackaday post, I think its criticism of Honda’s failure to meet its community where it is are directionally correct. Others, including me, also directed some criticism at Printables itself for what appears to be, from the outside (always an unreliable evaluation viewpoint), a fairly noncritical acquiescence to Honda’s claims. (In my defense, describing the letter as “a huge legal document” imposing a “very tight deadline” in explaining why the takedown happened does not exactly suggest a carefully considered review.)

In any event, I’ve written about these types of (potentially) overly broad takedown claims before, and about the structural incentives that can punish platforms for viewing them critically.

Instead of just complaining about everyone’s behavior, in this post I want to be productive. The post will try and walk through how I would think about processing and responding to this kind of letter. Since, like everything on this site, this post is not legal advice, I’m going to sidestep the legal details and focus on more operationally-oriented steps (if you are curious, the posts linked to above provide some legal context). Those legal details will matter when trying to actually implement anything like this approach on a specific platform (especially across jurisdictions). However, they are not necessary to follow the general flow.

Step 1: Take a breath, read, and sort

It is important to remember that no one just happens to send a long, scary looking letter on fancy letterhead that includes a short deadline for response. These letters - sometimes referred to by lawyers as ‘nastygrams’ - are designed to intimidate and encourage compliance.

That does not mean that you should ignore them! But it does mean that you should keep that in mind when you are reading them. That’s why the first thing to do when receiving a nastygram is to take a deep breath and remember that the letter is, at least partially, designed to intimidate you.

The second thing to do is to actually read the letter and map out what it says. Specifically, what rights is the sender actually claiming, and how are they connecting those rights to specific models (either individually or as a class)? Lawyers can sometimes try and bluff their way through these details, so read the letter critically. The details will matter later on.

Once you have read the letter, try and sort the claims and models into specific buckets. Does the letter claim that some models infringe on copyrights while others infringe on trademarks? Are objections to models or the language describing the models? Something else entirely?

Step 2: Act on the easy stuff

If the letter includes all of the elements of a true DMCA takedown, claims that specific models infringe on copyright, and lists the models, it should be easy to deal with those models with an existing DMCA process. No reason to wait. If the letter includes trademark claims, try and make some triage decisions. Not all uses of a trademark infringe on the mark! If you are lucky and have thought about this stuff in advance (see below), act on any models that are easy calls. Do so knowing it can be ok to take more time considering the models that feel closer to the line.

Step 3: Reach out and ask for clarification

Once you have your head around what the letter is really asking of you and made some easy decisions, it may be time to reach out to the party that sent it. Reaching out can show the sender that you exist and are a good faith actor. Tell them what you have done, and ask for clarifications to help you evaluate whatever is left.

There are a few reasons to reach out even if you have not immediately and fully complied with the letter’s request. With regard to the party that sent you the letter, it is likely that they send these kinds of letters to all sorts of sketchy, bad-faith actors and never receive any sort of response. Responding signals to them that there is a real person at the platform who is paying attention, taking their concerns seriously, and acting in good faith. Depending on how you structure your questions, it can also be a way to signal that you won’t be intimidated by broad gestures at unspecified ‘rights’ that are not tied to specific claims.

The second audience for your response is a court. If things go totally sideways, your dispute may end up in front of a court. At that point, the judge or jury will need to decide if you are the horrible pirate den that you are accused of being, or a responsible, responsive community of creative people trying to balance many competing rights. Building a record of constructive engagement can be helpful in making the case that you fall into the second category.

In formulating your response, it can also be helpful to have done some thinking in advance about what you might want to push back on and why. Are you a platform that is content to let large rightsholders define the rules, even if large rightsholders want to create rules that give them much more power than they are legally entitled to? Or are you trying to create a space where people can engage with the world in a way that recognizes that rights exist and have limitations? This can be a harder decision than it might appear. Not every platform sees itself as working with intentionality to create space for its users. That’s why it is helpful to consider it outside of a crisis context. Understanding your own framework will help you calibrate your response.

Step 4: Be as transparent as possible

Whatever you end up doing, take steps to explain to targeted users and the community exactly what is going on. There will be limits to your transparency - to protect users, the platform, and even the party that sent you the letter in the first place. However, to the extent possible, explain what rights are alleged to be infringed upon, how you evaluate those claims, and what steps all parties can take to avoid problems in the future.

None of this will eliminate conflicts between external rightsholders, users, and a platform. However, if done right, it introduces a degree of accountability into the process for everyone involved. If nothing else, that helps to make sure that the balance struck by the rules governing a platform are reasonably related to the balance struck by the law.

Header Image: The Board of Censors Moves Out from the Met’s open access collection.

January 25, 2022Michael Weinberg

Are NFTs Compatible with OpenGLAM?

NFTs are hot. OpenGLAM is hot. People have THOUGHTS and OPINIONS when NFTs and OpenGLAM get near each other.

Before spending any time discussing if NFTs and OpenGLAM should go together, it is worth considering if NFTs and OpenGLAM even can go together. Are they even conceptually compatible?

This question of compatibility between NFTs and OpenGLAM is a marginally interesting question on its face. It is an actually interesting question when used to illuminate an ongoing debate about the relationship between open collections and GLAM finances. This post is my attempt to do the second thing.

Here’s the short version: GLAM institutions should realize that NFT discussions are really discussions about leveraging the power of an institutional brand in an environment where the objects in the institution’s collection are ubiquitous. Regardless of how (or if) they decide to engage with NFTs themselves, those institutions can capitalize on NFT-spurred discussions to think more broadly about ways to move revenue generation models away from scarcity and towards ubiquity. That’s true even if the NFT discussions are only happening internally.

NFT Preamble I: Problems

This post begins, as any post about NFTs must at this point in the conversation, by recognizing the many problems that exist with NFTs today. These problems start with their truly horrifying environmental impact involved with the creation and transacting of NFTs (documented early, and I still think best, by Everest Pipkin here. One can debate as to whether or not this is strictly speaking an NFT problem or a blockchain (or a proof of work blockchain) problem, but NFTs are on a blockchain so for the purposes of this post that distinction doesn’t really matter).

The problems continue with the fact that it seems likely that the overwhelming majority of people involved in NFTs do not appear to fully realize what rights NFTs actually convey. That includes people who think that buying NFTs gives them some sort of ownership over the thing represented by the NFT, people who think selling an NFT for something you don’t own infringes on the owner of the thing’s rights, and people who think that buying an NFT in a thing gives you control over that thing beyond what you would have if you actually purchased the physical thing.

An additional set of problems emerge because the NFT market is frothy and unregulated. That makes it home to pretty much every flavor of financial scam yet conceived by humanity (we are in the ‘fraud is so widespread that content farms write articles about NFT scams’ phase of NFT evolution). Among other things, that means that pretty much any headline valuation of an NFT should be read with a degree of skepticism, and any new NFT scheme should be evaluated for potential scamminess.

Finally, the NFT space provides a place for people to waive away or just imagine their own set of legal and cultural rules. That happens with varying degrees of effectiveness and can actually end up being kind of useful in some cases, while being terrifying in others.

These problems emerge from the current incarnation of NFTs. NFTs may someday become more environmentally responsible, better understood by people involved in them, less scammy, and more tethered to reality. But that’s true of a lot of things. There is no law of gravity that demands that be so. And, as of now, the “claim to legitimacy” to “evidence of legitimacy” ratio for NFTs is still way out of whack.

To be clear, none of this means that the NFT space is completely devoid of interesting projects. It just means that they are fairly hard to find, and ones that I have found don’t really justify a multi-billion-dollar phenomenon.

NFT Preamble II: What They Are

There are a million explainers about NFTs on the internet. If you are reading this you already know the gist, so I’ll focus on the part that is important for this discussion. To the extent that you need more background, this article by Kal Raustiala and Chris Sprigman is great. This article by Foteini Valeonti et al is also helpful, and provides additional context for the GLAM space specifically.

For the purposes of this post, the most important thing about NFTs is that they do not claim any sort of ownership in the underlying thing represented in NFT, or assert a transfer of ownership of the underlying thing between the seller of the NFT and the buyer. That’s how Brian Frye, someone with a very nuanced understanding of the legal and conceptual aspects of NFTs, felt very comfortable selling an NFT of the Brooklyn Bridge, a piece of infrastructure that he certainly does not own. That’s also why it is not copyright infringement to sell an NFT of someone else’s art.

Instead, all an NFT claim is that someone wrote down that the original buyer was associated with a thing in a blockchain ledger. While NFTs can work conceptually even with a random person and a random ledger, in practice the economics of NFTs require that someone care enough about the combination of the original recorder and the ledger to pay for it.

OpenGLAM Preamble I: The Issue

OpenGLAM (that’s Galleries, Libraries, Archives, and Museums) is a community of people, institutions, and practice working to make our common culture available to anyone, so they can use it anywhere, in a manner of their choosing. In practice this means focusing on digitizing public domain works and releasing those files under a CC0 public domain dedication.

Many of the institutions involved with, or considering getting involved in, the OpenGLAM movement had or have legacy rights and licensing departments. In a pre-OpenGLAM context, these departments sold licenses for various types of reproductions of works in their collections. That includes licenses for works that were themselves in the public domain. If you want or wanted a picture of a specific painting in your fancy art book or academic article, these are the people you pay to use the picture.

This kind of licensing model is built on exclusive control. The institution could charge for a license to use a picture of a 13th century painting because they were the only ones who had full access to it and were the only ones with control over a high-quality image of it. If everyone had access to the painting, or if the high-quality image of the painting was freely available for anyone to use, there would not be a reason to pay the museum to use it. (There actually are still reasons that you might want to pay the museum to use it, but that’s an issue for another post (or a bit later on in this post)).

This raises a question for institutions who are considering transitioning to an open model: what will going open mean for the revenue they traditionally drew from licensing?

Before connecting that question with NFTs - we’re getting there, I promise - I want to add three caveats to it:

Caveat 1: Most institutions draw relatively modest revenue from their licensing programs. When you factor in the costs of actually running these licensing departments, many of them may actually lose money.

Caveat 2: There is plenty of evidence that open strategies can be used to increase revenues.

Caveat 3: It is always reasonable to ask how institutions should think about balancing their need to pay for things with their mission (and social compact) to make their collections available to the public, and if licensing revenue should even be considered legitimate in the first place. Suffice it to say that this is a question upon which reasonable minds may disagree. More importantly for the purposes of this article, regardless of its conceptual legitimacy, it is a concern that institutions regularly express.

Enter NFTs

NFTs do have at least one interesting feature: they combine exclusivity with ubiquity. As discussed earlier, the NFT isn’t the thing it represents. The NFT is just someone writing down your name next to the name of the thing in a ledger. That means that what you are buying (being written down in the ledger) is not in inherent conflict with the thing your name is being written down next to being ubiquitous and freely available to everyone. The thing being available to everyone does not change the fact that your name was the one written down in the ledger.

The result is that, unlike selling licenses to use images, there is not an inherent conflict between selling an NFT and making the underlying work connected to the NFT freely available to anyone. OpenGLAM and NFTs are compatible, at least conceptually. Widespread, open access to the underlying thing does not directly impact the economic viability of the NFT.

Instead, in many ways, most of the value of the NFT comes from leveraging the reputation of the institution that is writing the buyer’s name down in the ledger next to the name of the work. An NFT of the Mona Lisa offered by the Louvre (that’s the Louvre writing your name down next to the Mona Lisa in the ledger) would probably be worth more than an NFT of the exact same Mona Lisa offered by me (that’s me writing your name down next to the Mona Lisa in the ledger).

Why the distinction? The NFT represents officially sanctioned affiliation with the work. That is different from ownership of the work or control over the work. Nonetheless, that officially sanctioned affiliation might be worth something to someone.

NFTs as Official Merch

Viewed this way, the NFT model is strikingly similar to an ‘official licensing’ strategy for open works. Under both models, everyone has full access and use of the open access work. Also under both models, some people can pay a fee to the institution for a more official affiliation.

For the NFT, that affiliation takes the form of a record in a ledger that documents that it was the institution that wrote your name down next to the name of the work. For the official licensing model, that might take the form of the right to use the institution’s trademarks with your product that incorporates works from the open collection, some sort of certification that the use or reproduction of the work is in line with curatorial standards, or something else.

Both of these models sell an institutional endorsement of the user’s affiliation with the work without limiting anyone else’s access to the work (although only the NFT comes with a frothy market and the ability to easily transfer that affiliation).

Is This a Good Idea?

Just because the idea of NFTs is compatible with the idea of OpenGLAM does not mean that OpenGLAM should embrace NFTs. It only means that OpenGLAM could embrace NFTs if it was so moved.

The start of the answer to the “is this a good idea?” question lies in NFT Preamble I above. There are major risks and costs that come with getting involved with NFTs as they currently exist. These include environmental costs, reputational costs, and other costs in between.

These costs are real and high. Even if the sale of some NFTs could finance the opening of an entire collection, the cost of doing so under current conditions could be very hard to defend.

Another part of this answer lies in the fact that institutions need to do real thinking about how and why they use their brand to validate affiliations. That is true in the NFT context, and in its activities more broadly.

At a minimum, without doing anything else, NFTs may act as a mechanism to allow institutions to think more creatively about what it means to capture value from their open collections. It makes it easier for people within those institutions to think more broadly about their brand association value, especially in connection with their open collections. If that happens it will be useful. Although maybe not useful enough to justify everything else.

Header image: Scene at a Fair: A Magician from the Met Open Access collection

November 02, 2021Michael Weinberg

Commercialization-as-a-Service - A Missing Layer of Open Hardware?

Today the Engelberg Center released a report from the Open Hardware Distribution & Documentation Working Group that explores what is really needed to create a distributed manufacturing network for open hardware. You can find the official launch post here and the report itself here. I recommend checking it out.

This post focuses on one part of the report: the idea of “Commercialization-as-a-Service.” For me, this was the most intriguing thing to come out of the year’s worth of discussion within the Working Group (and it was a discussion full of intriguing things!).

Open source hardware is constantly working by analogy to open source software, so let’s start there. With open source software, the manufacturing and distribution layer is fairly straightforward and so lightweight that it is easy to miss. Publishing code online effectively manufactures and distributes it worldwide.

Admittedly, in reality the “it just happens” nature of open source software distribution breaks down pretty quickly. Nonetheless, “pretty quickly” is not immediately. The ease of code distribution online means that a piece of open source software can get at least moderate use and build a decent sized community without being supported by significant commitments to building out infrastructure.

That is not really the case with open source hardware. While it is true that you can easily post design files for hardware online, design files for hardware are not hardware. In order to get beyond a relatively small core of people who will assemble and build their own version of the hardware based on instructions, developers of open source hardware are likely to confront a need to begin packaging physical objects and distributing them to others. These objects might take the form of a kit that others assemble, fully constructed hardware, or something else entirely.

Regardless of what form it takes, this manufacturing and distribution layer becomes an independent obligation for the team much earlier in the hardware lifecycle than it would in the software lifecycle. Furthermore, this layer is probably further from the work of designing the hardware than the equivalent responsibility is from writing software. That means that there is no guarantee that the team that originally came together to design and build the hardware has someone (or a group of someones) who are also excited about spinning up a manufacturing and distribution network. That is especially likely to be true in the context of research lab-developed hardware (which is the focus of the paper).

Where does that leave things? Manufacturing and distribution of open hardware becomes an important stand-alone task early in its development cycle. At the same time, the team working to create that hardware may not be excited about engaging in that task.

Enter Commercialization-as-a-Service

One way to try and solve this problem is to get more people excited about building out manufacturing and distribution networks. An alternative approach would be to make open hardware projects less reliant on building their own manufacturing and distribution networks in order to succeed.

This second option may be most realistic in situations where open hardware has been developed in scientific research lab contexts supported by grant funding. In those situations, a grantor has already paid for the development of the open source hardware in the context of the specific project. Currently, in most situations that hardware either remains within the developing lab or is effectively abandoned.

Creating commercialization-as-a-service could make it much more likely that open hardware developed in one lab becomes available to the research community more broadly. That service layer could develop a standing network of international manufacturing and distribution partners, as well as expertise in productization and marketing of hardware. It would work as a pump, pulling proven hardware out of labs and pushing it to other researchers worldwide. In doing so, it would reduce the current ‘unicorn’ condition of academic open hardware success, whereby a successful open source hardware project needs a team that happens to combine the engineering skills to develop the hardware and the logistics skills to manufacture and distribute it.

Building this layer feels like the accelerant role that foundation and other grantors are well positioned to play. Assembling the expertise required to productize, manufacture, and distribute hardware can be done once and applied to a range of hardware. Doing so could greatly increase the impact and adoption of the hardware that funders are already funding. Conversely, without building this layer, these funders are making it much more likely that effective hardware remains tucked into individual labs, hidden from the rest of the research community.

Of course, this is just one of the things that’s in the new report. You can read the entire thing here.

Header image: New Inventions of Modern Times, The Invention of the Clockwork, plate 5, from the Met’s Open Access collection.

Michael Weinberg

I put things here so they are on the internet