Licensing Deals Between AI Companies and Large Publishers are Probably Bad

Licensing deals between AI companies and large publishers may be bad for pretty much everyone, especially everyone who does not directly receive a check from them.

Read More...
Is There A Coherent Theory of Attributing AI Training Data?

It feels like any time I have a conversation about attributing data used to train AI models, the completely understandable impulse to want attribution starts to break when confronted with some practical implementation questions.

Read More...
Make Government-Funded Hardware Open Source by Default

Earlier this year the Federation of American Scientists (FAS), Center for Open Science, and the Wilson Center held an open science policy sprint to source and develop actionable policy ideas aimed at improving scientific transparency, equity, and innovation. Some heroic editing from the FAS team (especially Jordan Dworkin and Grace Wickerson) helped transform “uh, if the government pays for hardware it should be open source” into the actual proposal below. You can see the original version in situ here.

Read More...
Licenses are Not Proxies for Openness in AI Models

Earlier this year, the National Telecommunications and Information Administration (NTIA) requested comment on a number of questions related to what it is calling “open foundational models.” This represents the US Government starting to think about what “open” means in the context of AI and machine learning.

Read More...
Clearing Rights for a 'Non-Infringing' Collection of AI Training Media is Hard

In response to a number of copyright lawsuits about AI training datasets, we are starting to see efforts to build ‘non-infringing’ collections of media for training AI. While I continue to believe that most AI training is covered by fair use in the US and therefore inherently ‘non-infringing’, I think these efforts to build ‘safe’ or ‘clean’ or whatever other word one might use data sets are quite interesting. One reason they are interesting is that they can help illustrate why trying to build such a data set at scale is such a challenge.

Read More...