Life Science

Weathering the storm of genetic data

The startup SolveBio wants to be a one-stop shop for scientists trying to cope with data overload

December 29, 2016
Genetic sequencing technologies are creating a huge repository of data — but organizing that information to create successful therapies remains a challenge. [Image credit: OpenStax | CC BY 4.0]

Mark Kaganovich kept running into the same problem as he worked toward his graduate degree in cancer genomics at Stanford University. He wanted to spend his time determining genetic factors contributing to cancer. Instead, he ended up worrying about the quality of the genetic data he was analyzing.   

“What we noticed was that such a big part of the technical component was constantly having to download, parse and normalize all this data,” Kaganovich says. “It just seemed like there was a better way.”

Living in Silicon Valley, Kaganovich was inspired by the fast-paced, collaborate evolution of the internet. He resolved to create a tech company to organize the massive torrent of genetic information available to biomedical researchers. In 2013, he collaborated with David Caplan, David Gross, and Paul George to found SolveBio.

SolveBio aggregates many genetic datasets into one location that researchers can efficiently analyze to better diagnose and treat diseases. Kaganovich likens the product to the famous “Bloomberg terminals” (named for billionaire entrepreneur and former New York City Mayor Michael Bloomberg) used by financial analysts to query all available market data. In this case, though, the currency is genetic data, which researchers can then mine in search of answers to crucial questions like, how frequent is a particular genetic mutation? How many people does it harm? And ultimately, how can we develop a drug to fight the resulting disease?

The company operates out of Manhattan’s Tribeca neighborhood, in an airy sixth-floor suite with light wood floors. Desks for the company’s 10 employees are clustered in the center, as they work to centralize the disparate strands of genetic data.

“Biology is a collection of observations … If you don’t have all that information collected and available instantly and trustworthy then you have to start from square one,” says SolveBio Chief Technology Officer David Caplan. “What we decided to do is go back and say, how do you build the software you need to do that? How do you build the relationships to disseminate that information and make sure you can do it properly?”

This goal is key for an industry that’s expanding rapidly. In 2003, the Human Genome Project delivered the first fully-sequenced genome at a cost of $2.7 billion. Today, it can take just one day for companies to sequence a person’s entire genome for about $1,000. Genetic sequencing has actually progressed faster than Moore’s Law, which predicted that computer processing power would double every two years.

“What I’m most passionate about is the sheer explosion in the amount of data and technology that’s at our disposal,” says Dr. Thomas Morgan, head of Human Disease Genetics at the pharmaceutical company Novartis.

“No company is where it needs to be yet in this space,” adds Morgan, who emphasized that he was speaking for himself and not for Novartis. “We need to tie together all the diverse sources of genetic data that are linked to disease information.”

To accomplish this goal, SolveBio collects data from three sources. The first are publicly available datasets, usually created by government consortiums, such as ClinVar or the Human Gene Mutation Database. The second is the customer’s internal data from clinical trials or research projects. These two sources comprise the bulk of the material, but SolveBio also partners with companies that curate and sell information, like Thomson Reuters. Additionally, SolveBio offers genetic experts to help customers use the software.

The price depends on how many people use the program and how much data they access, but Kaganovich declined to disclose exactly how much the product costs. As of now, the startup has 25 customers, many of whom are large pharmaceutical companies. The company does not release its annual revenues, but SolveBio has raised $4.5 million in investment capital from 12 investors including Andreessen Horowitz, HVF and SVAngel.

The company is still establishing its market but seems well positioned for the future. Pharmaceutical companies “understand more and more the necessity of identifying genes and mutations for developing novel drugs, and the concept of personalized medicine. So I think they entered a good niche,” says Yuval Itan, a genomics researcher at Rockefeller University who is not affiliated with SolveBio.

Itan notes that Solvebio’s advisors include some powerful names in biotechnology, including famed Harvard geneticist George Church, Joel Dudley of New York’s Icahn School of Medicine at Mount Sinai, and Yaniv Erlich of Columbia University.

SolveBio is starting with pharmaceutical research and development — a multibillion-dollar sector — but hopes to eventually create versions of the product for clinicians, genetic counselors and researchers.

“The way we see it is today, we have a small number of very large customers, but tomorrow, hopefully, we’ll have a large number of small customers,” Caplan says.

No other company does exactly what SolveBio does — yet they still have a major competitor. That competitor is the “in-house solution.” Traditionally, pharmaceutical companies that identify data-related difficulties will assign an internal task force to solve them, Caplan says. In those cases, SolveBio’s challenge is to demonstrate how their product is superior to each company’s solution. Morgan of Novartis agreed that this would be his concern if deciding to adopt the product.

The team has a few other challenges. One is turning the vast amount of data they obtain into formatted, curated and customer-ready material. Since the company handles sensitive medical information, it also needs to invest in security and set up processes to control access to its technology and create an audit trail for every software update.

Every field needs established workflows, says Stephen Wolfram, another advisor to SolveBio and the creator of the widely used computer programs Mathematica and Wolfram|Alpha. For genetics, those tasks range from sequencing tumors, to enrolling people in clinical trials, to conducting population studies.

“Somebody needs to build a definitive kind of enterprise solution for genomic data,” Wolfram says. “It’s usually both some heavy lifting and lot of detailed work, to build out the workflows that are actually useful to people in the pharma business and so on. And I thought these guys [at SolveBio] actually seemed to get it.”

As for Kaganovich, he’s glad he made the switch from data scientist to healthcare entrepreneur.

“I genuinely feel very fortunate to be doing exactly what I want to do,” Kaganovich says. “We get to work with people who are literally developing drugs to save people’s lives.”

** Editor’s Note: This story has been updated. It originally said that the process of data import and maintenance costs SolveBio $30,000 to $40,000 a year, which is an inaccurate representation of the company’s finances.**

About the Author

Abigail Fagan

Abigail Fagan graduated from the University of Rochester with a major in brain and cognitive science and a minor in English literature. After graduating, she worked for the publishing company Macmillan Learning, helping to develop their science textbooks. She’s also worked as a freelance writer for the World Science Festival and Weill Cornell Medicine. In her spare time, she loves reading, playing board games, and consuming Nutella.

You can follow Abigail on Twitter here


Leave a Reply

Your email address will not be published. Required fields are marked *


The Scienceline Newsletter

Sign up for regular updates.