Attribution Intelligence

The early 1970s were frightening years in American pop culture: bell bottoms were in vogue, vinyl jumpsuits were encasing the populace, and sci-fi thriller The Andromeda Strain had just premiered in theaters. The blockbuster hit, based on Michael Crichton’s first best-seller, explores what happens when biosafety goes wrong as a team of scientists scrambles to find and contain a deadly pathogen. Strip away the film’s overdramatic technobabble and tinny soundtrack and you’ll find a prescient message about the overlap of biosafety and scientific ambition: When a technology’s accessibility outpaces regulations, danger ensues.

As access to genetic material becomes increasingly decentralized, is it only a matter of time before a rogue citizen scientist sequences a real-life Andromeda Strain?

The possibility of an individual using synthetic biology to create a lethal superbug is a largely unfounded fear, according to biohacker and genetic engineer David Ishee. Ishee heads R&D at The ODIN, a company that sells DIY gene editing kits like agrobacteria experiments that teach users to engineer a color change in plant leaves. He believes that genetic engineering could become as accessible as other home tech, like coding or building robots out of a garage. Jokingly, Ishee refers to himself as a genetic toymaker, but biohackers ultimately serve as synbio educators: lowering the barrier to entry for genetic engineering helps people gauge its realistic outputs.

Ishee’s experience engineering genes eases his nerves about the likelihood of a lone scientist unleashing a genetically engineered superbug. “It’s really hard to make things worse than nature already does,” he says. “There’s a wild amount of optimism in thinking somebody’s gonna cook up a bioweapon or something in their bathroom.” Even for the most skilled scientist, there’s only so much malevolent engineering that a single person can do with finite funds, time, and access to laboratory supplies.

Powerful governments and corporations, on the other hand, can funnel fortunes into genetic engineering research. And biosafety regulations haven’t always caught bad science in time.

In January 2014, employees at the Centers for Disease Control’s Influenza Division accidentally cross-contaminated two bird flu strains, one of which was highly pathogenic. Luckily, the virus’s recipient (a government poultry research lab) was a high-containment lab equipped to handle contagious pathogens. No researcher was exposed to the mutant virus, but all were unaware they had handled such virulent material until months later—the incident was not reported to CDC management until July 2014.

Last year, researchers at Boston University (controversially) combined the backbone of the original SARS-CoV-2 strain with the gene for the Omicron variant’s spike protein, a segment of genetic material believed to drive Omicron’s high transmissibility. While they aimed to compare the recombinant virus’s properties with those of the naturally occurring Omicron variant, the outcome, according to a preprint, was a significantly more lethal virus that killed 80% of mice in the study. Experts warned that such experiments with contagions could cause an outbreak in the rare event of an unintentional lab leak.

Genetic engineering’s scale will only grow more unwieldy as the technology becomes more widespread and accessible. But if synbio democratizes to the point where anyone can design a sequence, could the field benefit from anyone being able to trace the origin of a sequence?

“All technologies concentrate power in the hands of those who have them. So, you want to develop technologies to monitor how people are using genetic engineering,” says Josh Dunn, Head of Design at Ginkgo Bioworks. “That means attribution.” Genetic engineering attribution could reveal who engineered specific sequences, whether those sequences code for bioweapons or breakthrough therapies.

Machine learning, a subset of artificial intelligence that makes predictions on large datasets, has emerged as synthetic biology’s best bet for calling out credit and culprit. Experts believe that machine learning could award positive innovators, identify the source of accidents, and hold malicious designers accountable, all while being flexible and adaptable enough to keep pace with synthetic biology’s increasing scale.

While machine learning is a new synbio application, biosafety is not. The symbiosis between biosafety and synthetic biology began with the technology’s initially rocky public reception.

Brave New City

In 1971, the same year that The Andromeda Strain hit theaters, real synthetic biologists reached a milestone: the first gene splicing experiment. Biochemist Paul Berg and his colleagues at Stanford were the first to combine DNA from two different species to create a new organism with recombinant DNA (rDNA).

At the time, his breakthrough summoned celebration and trepidation. Some scientists worried that the chimera’s viral DNA, known to be carcinogenic in rodents, could escape containment and cause cancer in humans.

With biosafety at top of mind, Berg invited 140 “A-list” biologists to a historic 1975 conference in Asilomar, California to discuss how the scientific community should regulate rDNA experimentation. The scientists coalesced comfortably, but their comradery couldn’t quell the overwhelming sense that they were about to change biology forever. In a 1975 Rolling Stone article, one biologist equated his Asilomar experience to something biblical: “Here we are, sitting in a chapel, next to the ocean, huddled around a forbidden tree, trying to create some new Commandments—and there’s no goddamn Moses in sight.”

The scientists ultimately agreed on a tiered risk assessment system to contain novel recombinant organisms; rDNA experiments that risked creating pathogenic or biohazardous organisms were to be conducted in high-containment labs to minimize the risk of public health emergencies. But they could only self-regulate the research for so long before rDNA became a matter of public policy.

In 1976, biology professor Mark Ptashne caught the attention of zoning officers in Cambridge, Massachusetts, when he requested funding for a high-containment rDNA research laboratory on Harvard’s campus. The building request escalated into a series of increasingly heated debates among Harvard biologists over whether such potentially risky research should even happen in such a populated and transient city.

Harvard’s clambering worried Al Vellucci, Cambridge’s populist mayor who initially interpreted the burgeoning technology as “Frankenstein’s dream.” Vellucci’s sensationalization proved excellent advertising for his special hearing on rDNA biosafety at City Hall that June. The hearing attracted a diverse crowd: Local residents, rDNA proponents and opposers lined up for a whiff of the latest homegrown scientific tea. Onlookers packed Cambridge City Hall as if it were a sports arena, carrying signs with slogans like “No Recombination Without Representation.”

Vellucci set the stage for a balanced discussion accessible to anyone—not just scientists. “Whether this research takes place here or elsewhere, whether it produces good or evil, all of us stand to be affected by the outcome,” he said. “The debate must take place in the public forum, with you, the public, taking a major role.”

The public’s biggest advocates were the opposition scientists. While they weren’t opposed to rDNA research as a whole, they were leery of allowing new technology to propagate without complimentary biosafety regulations. They claimed it was every scientist’s duty to prize responsible citizenship over scientific ambition. Even the most fiery opposition member, MIT biology professor Jonathan King, who compared rDNA research to the Manhattan Project, believed Cambridge citizens should serve as science’s final court of appeal. “The people here pay the taxes and they bear the risk and they’re supposed to reap the benefits,” King said. “Well, let them decide.”

That’s exactly what they did. Cambridge instated a city-wide rDNA research moratorium while a citizen’s review board convened for over 100 cumulative hours hearing scientific testimony, touring labs, and debating rDNA’s societal risks and merits.

Seven months later, the review board approved rDNA research within Cambridge city limits and even added their own safeguards beyond NIH guidelines. This was the country’s first municipal biosafety ordinance. While the Cambridge City Council inked the ordinance, the people ultimately decided rDNA’s fate.

The biosafety measure incentivized pharmaceutical giant Biogen to establish its R&D headquarters in Cambridge in 1982. The two-page ordinance clearly stated rDNA experimentation rules, and the public backing made it unlikely for Biogen’s work to trigger bad press. Other companies followed Biogen’s example, eventually transforming the Greater Boston area into the world’s top life sciences cluster, all built on responsible biosafety in genetic engineering.

“Here we are, sitting in a chapel, next to the ocean, huddled around a forbidden tree, trying to create some new Commandments—and there’s no goddamn Moses in sight.”

AI Attribution

Nearly fifty years after the historic biosafety ordinance, Cambridge remains a hotbed for new synbio evolutions. Nowadays, Cambridge residents aren’t just spectators at biosafety hearings—they’re using AI to compete in biosafety contests.

In July 2020, over 1200 data scientists and machine learning enthusiasts participated in the Genetic Engineering Attribution Challenge, a virtual competition organized by research non-profit altLabs and scientists at Cambridge’s MIT Media Lab. The competition challenged contestants to trace engineered DNA sequences back to their lab-of-origin using machine learning algorithms. Designers of the most accurate algorithms would leave with up to $30,000 in prize money.

The competitors shattered every record. Over 13% of all GEAC teams attributed sequences more accurately than the reigning attribution accuracy champion. GEAC judges evaluated how often an algorithm placed the lab-of-origin in its top 10 predictions, a metric known as “top-10 accuracy.” A short list of potential subjects is actually more valuable to forensic investigators than a singular top pick, according to William Bradshaw, GEAC director and researcher with MIT Media Lab. “You don’t want a model that spits out one name unless it’s very, very accurate,” he says. “We’re not really talking about having this model be the sole determinant of who we think did this.” 75 of the 299 GEAC teams designed algorithms with higher top-10 accuracies than any previously recorded machine learning genetic engineering attribution models.

Dunn explains that attributive algorithms like the ones from GEAC are most useful when implemented proactively, not just during times of crisis. Preventative biosafety plans can stop outbreaks from happening in the first place. In 2021, Gingko partnered with the CDC to implement a COVID-19 wastewater monitoring system in four US airports. The program detected two newly evolved Omicron strains and traced them to two Newark-bound flights from the UK and South Africa. By strategically sampling a handful of hubs, rather than sampling every country’s wastewater, Ginkgo attributed the novel strains and contained an outbreak with relative ease.

“By sampling at key places, you can be noninvasive,” said Dunn. “You can respect privacy. You can contain all sorts of threats, whether they’re natural or engineered, really, really, early.”

Mistaken Identity

The GEAC’s proof-of-concept suggests that machine learning could be used for a variety of biosafety roles, one of which is protecting designers’ intellectual property rights. While that’s much less exciting than tracing a bioweapon, it’s just as important to upholding scientific credibility.

Ishee speculates that increasingly accessible genetic databases may lead to cases of copyright infringement. “If there’s a novel sequence and there’s no law against just taking it and profiting off it … it might be difficult to prove that you even created that sequence,” Ishee commented. “I think the biggest use for this would be in more corporate and academic settings.”

However, attribution isn’t necessarily a priority for research scientists with track records of responsible genetic engineering.

“I’ve managed to work in biology for two plus decades and have never once thought about attribution,” says John Doench, a lab director at the Broad Institute of Harvard and MIT. Doench is a functional geneticist who studies the human genome, a monstrously large dataset. He believes the field should focus more on predictive machine learning, which could find patterns between genes and their functions, and leave attributive machine learning to the forensic investigators.

Doench likens attributive machine learning to a more familiar concept. “For a little fragment of time, police knew how to use fingerprints and crooks didn’t. But once crooks knew about fingerprint evidence, they put on gloves.” GEAC used publicly available bacterial plasmid datasets, which lend well to proof-of-concept, but aren’t analogous to real-life applicability.

Gigi Kwik Gronvall, Senior Scholar at the Johns Hopkins Center for Health Security, agrees with Doench’s fingerprint analogy. Forensic investigators use more than just technological tools when implicating someone in a crime. They also consider human factors like motive, bias, and social systems.

“You could have the best technology in the world,” Gronvall says. “but if people don’t trust it, if we can’t reproduce it, it’s going to come down to the people involved.”

Human in the Loop

Regulated communities should have a voice in the policies governing them, says Diane DiEuliis, a distinguished research fellow at National Defense University in Washington, D.C., who studies the impacts of emerging biotechnologies on biodefense, biothreat preparedness, and public health. Biosafety policymakers must appease not one, but two regulated communities: the public and scientists themselves.

“If you don’t think of the science first, you could create policies that actually harm scientific progress,” she says.

The genetic engineering biosafety conversation that initially included only Asilomar scientists must now include policymakers to ensure synthetic biology’s continued innovation. “I don’t think any scientist would tell you we truly, completely understand any process,” said Dunn. “But the question really is: do we understand it enough to be comfortable with the risks? If there are recognized risks, do we feel like they are rare, non-impactful, or correctable enough that we can tolerate that?”

Machine learning models tend to be a black box. Highly accurate models, like the ones designed by GEAC contestants, often lack the explainability that investigators need to contextualize biosafety evidence. Rather than pick sides in an impossible battle of humans versus machines, experts suggest humans incorporate themselves into AI’s decision-making process.

“Would you be comfortable getting in an airplane flown only by an autopilot? Probably not,” says DiEuliis. “Most people want to err on the side of human-in-the-loop.”

For democratized technologies like synbio, scientists and citizens can choose biosafety’s human-in-the-loop. Biosafety’s human component could ensure method reliability, like a human that monitors automated lab equipment. Even if the machine isn’t inherently dangerous, the human monitor can attribute or stop accidents like mid-experiment mechanical breakdowns.

Genetic engineering attribution’s human-in-the-loop could be an unbiased source of the humans who write machine learning models. The 2020 GEAC results showed that there may not be a need to rely on the shadowy silos of industry or academia for accurate algorithms; the competition’s fourth most accurate algorithm was developed by a duo that included Yao Xie, a kindergarten teacher with no formal computer science education.

“You don’t need a PhD to make progress in this space,” Bradshaw says. “These competitions are exciting because you get as much collective wisdom as possible, from an equity perspective.”

It’s reassuring to remember that the public has long had a say in genetic engineering biosafety, ever since the 1976 Cambridge review board approved rDNA experimentation. Even if that collective voice is fed into a dataset and funneled through a robot, it can be preserved, so long as synthetic biology and machine learning communities prize public safety as highly as scientific discovery.

Attribution Intelligence

As synthetic biology barrels forward, its biosafety counterparts must keep up.

Brave New City

AI Attribution

Mistaken Identity

Human in the Loop

Keep reading

The Black Box Breakers

Vaccinating Crops

A Feeling for the Organism