ContentSproute

What happens when public knowledge is created on private infrastructure? thumbnail

What happens when public knowledge is created on private infrastructure?

General

Over the past year, a considerable amount of recognition for machine learning (ML) has gone to researchers working in or alongside large technology firms, even as recent advances in artificial intelligence (AI) have been financed and built on corporate infrastructure.

In 2024, the Nobel Foundation awarded the physics prize to John Hopfield and Geoffrey Hinton for contributions that enabled learning with artificial neural networks, and the chemistry prize to Demis Hassabis and John Jumper for protein structure prediction (alongside David Baker’s computational design). Mr. Hassabis and Mr. Jumper were employed at Google DeepMind at the time of the award; Mr. Hinton had spent a decade at Google before departing in 2023. These affiliations don’t erase the laureates’ academic histories but they do indicate where prize-level research is now being performed.

This change rests on material conditions as well as ideas. State-of-the-art models depend on large computing clusters, curated data, and engineering teams. Google’s programme to develop tensor-processing units (TPU) for its data centres shows how fixed capital can become a scientific input rather than only an information technology cost. Microsoft’s multiyear financing and Azure supercomputers for OpenAI reflect the same political economy from a different angle.

general

Case for public access

Any research with public provenance should return to the public domain. In this context, public money has supported early theoretical work, academic posts, fellowships, shared datasets, publishing infrastructure, and often the researchers themselves. In parallel, the points at which the value became excludable lay increasingly downstream: with respect to computing resources (shortened as compute), this includes rights to data and code, the ability to deploy models at scale, and decisions to release or withhold weights. This helps explain why recent Nobel laureates have been situated in corporate laboratories and why frontier systems are predominantly trained on private cloud systems.

In the 20th century, firms such as Bell Labs and IBM hosted prize-winning basic research. However, much of the knowledge then moved through reproducible publications and open benchmarks. Today, reproducing the work of Mr. Jumper, for example, can require large compute budgets and specialised operations expertise. As a result the concern isn’t only that corporations receive prizes but that the path from a public insight to a working system is infrastructure and contracts controlled by a few firms.

The involvement of public funds should thus create concrete obligations at points where technology becomes enclosed for private control. If an academic laboratory accepts a public grant, the deliverables should include the artefacts that make the work usable, including the training code, evaluation suites, and weights in the AI models to be released under open licences. If a public agency buys cloud credits or commissions model development, procurement should require that the benchmarks and improvements flow back to the commons rather than become locked into a vendor.

Remove bottlenecks

The argument isn’t that corporate laboratories can’t do fundamental science; they clearly can. The claim is that public policy should reduce the structural advantages of private control. Consider the release of Google DeepMind’s AlphaFold 2, which, together with its code and public access to predictions, allowed researchers beyond the originating lab to run the system on (reasonably) standard hardware, retrieve large numbers of precomputed structures, and integrate their results into routine workflows. All this work was supported by public institutions that were willing to host and maintain the resources.

Where the corporate stack is indispensable, such as when training frontier models (with billions or trillions of parameters), claims about ‘responsible release’ often ironically translate to a closed release. Instead, a more consistent position should be to link risk management to a structured model of openness — perhaps one that includes staged releases, access to weights, open penetration testing tools, and a clear separation between safety rationales and business models — rather than allow private entities to resort to blanket secrecy in the name of safety.

The same logic applies to compute: that is, if computing resources become a scientific bottleneck, they should be treated as a public utility. National and regional compute commons should allocate resources for free or at-cost to academic groups, nonprofits, and small firms, and qualify them on open deliverables and safety practices. The ultimate goal is to restore the ability of public institutions to reproduce, test, and extend leading ML work without having to seek corporate permission. Without such a commons, however, publicly funded ideas will continue to be turned into working systems on private clouds and returned to the public as expensive information products.

Indeed, while it’s tempting to treat the entities employing the laureates and funding pipelines as separate issues, one symbolic and the other structural, they’re connected by the computing resources. The fact that the Nobel laureates worked at Google DeepMind reflects where teams with ML scientists, domain experts, data, and compute now operate. Likewise, the fact that the most visible systems of the past two years were trained on Microsoft Azure under a financing agreement explains who could attempt such training. Both facts reflect underlying resource concentrations.

general

Beyond industry vs academia

Public agencies’ response should be direct — by, say, tying funding to openness in grants and procurement and requiring detailed funding disclosures and compute-cost accounting in research papers. Where full openness would create unacceptable risks, agencies can use equity or royalties to fund compute and data commons that support the wider ecosystem. For corporate laboratories, on the other hand, their credibility should rest on measurable contributions to the commons.

Journalists and the publics should also move beyond an ‘industry versus academia’ framing.

The relevant questions are who sets the research agenda, who controls infrastructure, who can reproduce results, and who benefits from deploying the resulting AI models. Interpreting the 2024 Nobel Prizes as industry victories alone would miss the point that the knowledge base is cumulative and relies on public inputs, while the capacity to operationalise that knowledge is clustered. Articulating this pattern allows us to recognise scientific merit while demanding reforms that ensure public inputs produce public returns — in code, data, weights, benchmarks, and access to compute.

To be sure, the central conclusion isn’t resentment about corporate salaries but responding to the fact that breakthroughs are increasingly occurring at the intersection of public knowledge and private infrastructure. The policy programme should be to reunite the layers where public and private enterprises diverge — artefacts, datasets, and compute — and to bake this expectation into contracts and norms that govern research.

In these conditions, future awards can be celebrated with corresponding public benefit because the outputs that make the science usable will be returned to the public.

Read More

Scroll to Top