It has been almost two years since I posted anything here.  I have been preoccupied with growing a data science team that started when we recognized an opportunity to accelerate product development for our portfolio companies.  Mercury Data Science, Inc. (“MDS”) now has 15 people and is growing fast.

MDS is partnering with innovative, high-growth companies to move more quickly and efficiently to develop data science products in areas as diverse as Consumer/Retail/Marketing, Biotechnology, and Industrial sectors.

To explain how we got here, let me back up and start with some observations (circa late 2017):

Data Science (“DS”; also known as AI but that seems kind of hype(y) unless you are talking about self-driving cars, etc.) was becoming part of almost every software company.  In fact, it was kind of a running joke that every company had suddenly slapped AI lipstick onto their pitch deck.  The truth was that:

DS was becoming more like “cloud” in that most companies actually did REQUIRE a DS strategy to be competitive because it can make a huge difference in product performance, efficiency or customer satisfaction, and

DS was becoming a permanent and evolving part of the technology expertise that a corporation has to keep developing. To be 100% clear: DS needs are not like ordering a website refresh but is an ongoing core competency more akin to software development needs and maybe more necessary than that for some companies.

Similarly, we saw some companies with big data assets and plans for moving forward in DS to create value with that data but their team needed to bring in the necessary DS expertise. These skills are quite different from their existing software expertise. Also, the exact DS skills they needed are often a bit undefined at first (until some DS work is actually done on the data) so the companies needed some help getting started.  A chicken and egg issue here.

Even companies that knew what skills they needed had to work hard to find top data scientists to work on their problems and, in the process, were losing time getting a product to the market.  And yes, this observation is even more true in 2019.

None of this is unique to early stage startups or to Mercury Fund’s portfolio of companies.  Even growth stage or established companies were looking for how to move faster on DS.

In late 2017, I met a terrific data scientist.  We started introducing her to Mercury Fund portfolio companies and, 100% of the time, they needed and wanted her help.  This was the Light Bulb “ON” moment. There was no way that one person could do all the work so we thought “wouldn’t it be cool to create a company that could recruit top DS talent and have them drop in and accelerate companies in the early stage by building their DS products, vision and capabilities? And, wouldn’t it also be cool to be able to take the growth stage companies with big data assets and more quickly build defensible competitive value from that data?”

So that is what we did – we created a service company that to help speed development of DS enabled products.  Our thinking was that, to reach critical mass and attract top talent, we should work with any company that has a need and a cool problem that our data scientists want to solve. Now, two years in with 15 outstanding data scientists on board, we are doing just that. Dropping in and accelerating product development for companies using DS. And, our client companies have launched numerous DS products resulting from our work.

That’s what I have been up to.

I had a conversation with an entrepreneur yesterday who was trying to sell a data science product as well as machine learning services to pharma companies.  He was scratching his head over the fact that all of the companies now have an in-house data science teams that he has to compete with.  This is a single data point but it supports the idea that corporations are moving fast into AI/ML (and the related conclusion that point  [non-full stack] solutions are not going to be that interesting BECAUSE they can be developed in-house or by the current incumbent software vendor).

One thing I wonder about is whether these corporations are just thinking the same thing as many entrepreneurs – “how can I get some AI/ML technology” – when it may be more important to figure out where they can build a data castle (aka data moat or data plume: a data advantage that grows larger over time, perpetually outrunning the competition).  Good short articles about it here and here.

Some other links, added as I find them:

Anthem build data science team largely from retail industry.


There is a tidal wave of startup activity around Artificial Intelligence and Machine Learning (AI/ML). It is reasonable for startups and investors to be excited; AI/ML will (eventually) transform many, if not all industries. Here, we take out our crystal ball and make a few projection about what VC funded software startups will look like in 2018.


AI/ML will be a ubiquitous part of the startup tech stack

 Some entrepreneurs mistakenly think AI/ML is the pixie dust that will differentiate their company from all the others that walk in the door. But this is just table stakes to get in the game and the cost to play is getting lower. In the past, when mobile, cloud, and big data technology adoption enabled disruptions across multiple industries, costs dropped because of investments made by either large companies (Amazon, Google, etc.) or venture-backed companies to develop better horizontal (industry independent) tools and services. AI/ML is following the same trajectory of adoption and cost. Costs will continue to fall, technical capability will increase, and use will expand exponentially. So, a startup isn’t special because it uses AI/ML any more than a startup is special because it uses AWS. However, talent is in high demand. Having a strong data science team may be compelling if the rest of your business plan hangs together. For this gold rush, miners (data science talent) are a valuable asset but picks and shovels (tools, engines, algorithms) will be universal and nearly free.

VC’s have been busy funding AI/ML companies. In the near future, we will just call them software companies.


Workflow and “Full Stack” solutions beat AI/ML point solutions

AI/ML solutions typically reduce costs (e.g. labor), increase productivity, or improve accuracy. A software startup won’t close enterprise sales if it increases productivity in one place but makes the overall workflow more complex. More so, if it doesn’t improve the workflow (e.g. costs, time, and productivity) in a way that the incumbent competition cannot, there isn’t enough of a differentiator since incumbents can quickly develop the same AI/ML capability. AI/ML-based startups win when they improve workflow in a way that the incumbent technology cannot. This means that AI/ML-enabled software is most powerful (in terms of differentiation) when it provides a unique, full stack solution.

It is also difficult for AI/ML startups to win on just performance when performance is hard to quantify. In the case of many AI/ML applications, it can be difficult to compare two competing systems and declare a winner across all use cases. The “winner” can change over time since any given application can improve with more data, more computational power, and improved algorithms.

Having the best performing AI/ML algorithm at any given point is not a guaranteed winner – having the best full stack solution is the best bet.


The focus is going to be on industry vertical solutions

The promise of AI/ML is to change enterprise workflow (e.g. reduce human labor) or increase efficiency of a process. The technology will have a huge impact on parts of the economy that have traditionally had higher labor content and are slower to adopt new software solutions (manufacturing, oil and gas, medicine, pharmaceutical R&D, etc.). Billions of dollars are going into developing horizontal tools, and cloud services will drive down the cost to develop and deliver vertical AI/ML-enabled software. Software companies deploying vertical solutions often have the benefit of creating a data “castle” that grows more valuable with more customers.

In the long run, there will be a few, very large winners in the somewhat saturated horizontal category but there will be many more opportunities for AI/ML-enhanced software to disrupt the status quo in industry verticals.


Large corporations are going to be fully engaged with AI/ML faster than in past technology waves

 Because of the long-term promise of AI/ML, many non-software companies will invest significant resources into data science teams, consultants, and tools. It is unclear whether this corporate activity creates headwinds for startups as corporations try to implement applications on their own; we believe that the technology capability will move too quickly for most corporations to successfully build applications in house. In any case, given the impact of this technology, it is likely that we will see most corporations build AI/ML expertise to evaluate the AI/ML component of new software purchases.

The high level of corporate engagement with the new technology has the potential to accelerate adoption of new AI/ML technology.


This is the start of a new, long, and incredibly valuable transformation in software capability. There will be a huge number of opportunities for startups to benefit from this wave. Though the level of VC and startup activity in AI/ML is arguably a bit frothy at the moment, the technology is far from reaching it’s full potential, as it continues to mature the opportunity set will keep on growing.



Stuff I am reading related to this:

An entrepreneur came up to me after the recent RBPC to tell me that they made a big mistake. In a feedback session prior to the competition, the majority of judges encouraged them to not provide many details on the science since they won’t understand it anyway. The team had what to me looked like an interesting technology but treated it like a black box in their presentation. My advice was to open up more and talk about the science even if all the audience did not understand. The majority of judges at that point disagreed with me so the team took the (fairly logical) route they thought would appeal to the majority and kept the science out of it.

Unfortunately, the problem is that, in a group of judges, or investors, there is always going to be someone that has some expertise in the science. Everyone else will look toward that judge/investor to validate the science. If you fail to win that person, the others will lose interest. The non-scientific judges/investors don’t need to understand it. In fact, if you make it too simple you risk “if it is that simple, somebody must have thought of it before”.

This idea that you should stay away from technical details comes out of confusion as to the real problem – if the entrepreneur spends too much time on technology and not enough on the business explanation, they look like the stereotypical scientist that cares more about the science than the business. A huge red flag. You don’t create a problem by going deep, you create a problem by spending too much time on science. Yes, there are investors that want a mostly scientific presentation for, say, a novel therapeutic, but, with a mixed crowd, a few slides with the real, not dumbed down, technology is a good idea. Not overly complex but enough for an expert to feel like you are not making it all up. Then get to the business slides. It will make you look like a scientific rock star that understands business – what every investor wants.

Compared to modern software development, biology development is really, really slow. It can take an excruciatingly long time to develop an improvement in the incredibly complex system of DNA and proteins in a cell. Add more time to test for the target phenotypes (characteristics). If I have a coding mistake in my software product, developers, powered by pizza and mountain dew, can work through the night and fix it, running iteration after iteration to see the results of each change. If I have a coding mistake in a cell’s DNA, even in something as simple as bacteria, it may take a week to reset the code and run the next iteration to make sure that it is fixed. The bacteria will also enjoy the pizza – but won’t work any faster. It is even harder for plant and animal cells where you may have to wait months or years to see the effect of changes on mature organisms.


Automated Biology isn’t new, but it feels like it is reaching a tipping point where we are witnessing measurably faster biology development as tools for genetic manipulation and analysis have hit a tipping point where new entrants can move faster and cheaper than the incumbents. Machine Learning and big data predictive analytics tools are maturing, driven by the economics of the Internet, and will find increasing traction in computational biology when paired with the explosion of readily available data. CRISPR technology allows precise gene editing in a much more rapid and predictable way than was previously available. Faster, cheaper and more sensitive gene sequencing is becoming a commodity. In agriculture, drones can fly test crops and capture massive amounts of experimental data. All of these tools will enable the rise of new biological powerhouses just like the Internet and cloud computing have done for traditional IT businesses.


VC interest in Automated Biology is already starting to heat up, although it is still early days. A few examples:

Both Zymergen and Ginko Bioworks have raised $40 million dollar rounds in the last year to apply new automation ideas to the synthetic biology of microorganisms. Microorganisms are the logical start, being easier to manipulate and test than plant or animal cells.

Human Longevity using big data to address human aging (appearing to also target microorganism/microbiome studies initially as well).

Google is using experimental data across multiple diseases and multiple sources to identify effective chemical compounds for a particular disease.

Benson Hill is using cross species phenotype-genotype relationships and state of the art bioinformatics (for IT people thats big data, machine learning, etc) to rapidly drive decisions around rapid plant optimization (with plants already in the field showing 20% increase in yield just by tweaking photosynthesis pathways).

A16Z just announced a fund for the intersection of biology and computation with three themes: healthcare IT for therapeutic benefit, outsourced bio (like Ginkoworks), and computation in medicine (computer reading of radiology images for example).


For biology, it feels like the early 2000’s for information technology; you can see the possibilities when startups can understand and optimize biological systems in months and years instead of decades. Costs are being driven down by commoditization of the tools (hardware) and the utilization of tools, some borrowed from the IT space, that take advantage of the cloud, new data and development paradigms.


To me, the exciting question is how can we use the new tools, and outsourced bio resources, to better understand incredibly complex biological systems and move faster to develop novel and improved organisms and therapeutic entities. I think that this has the potential to create new companies, perhaps with new business models, that challenge the status quo in how discovery and development for the benefit of microbiology, plant and animal genomics and human health.

Feeding a larger world

Mercury Fund recently invested in the $7.3 million Series A financing of Benson Hill Biosystems. The company is a St. Louis and Research Triangle Park-based agriculture biology startup. Using a unique computational biology platform (PSKbase) and novel experimental platforms for high efficiency translation into model plants, Benson Hill is developing technology to boost crop yield in all kinds of plants. Our fundamental thesis with Benson Hill is that, in a world with a rapidly expanding population and growing climate volatility, crop yield will become increasingly important.

PSKbase, developed at the preeminent plant research center in the world, the Danforth Plant Science Center, gives the company an unrivaled ability to use massive amounts of data to predict genotype-phenotype relationships that can drive crop improvement. The platform allows for cross species analysis so the output improves as more data is included – corn, wheat, potatoes, soy, cane, even eucalyptus trees benefit from the genetic and experimental information from the other species. The result is the ability to create improved crops much more rapidly than current techniques allow. In an early test, PSKbase predicted eight candidate gene modifications for yield improvement and five were confirmed to boost yield by an average of 50% versus the control plants. Not only is that improvement exciting but the hit rate (simply picking a modification that works at all) is more than a 50x improvement over the historic, industry success rate of about 1%.

In addition to having a great management team, superstar scientific founders, and more strategic partnerships in place than many Series C funded companies, Benson Hill is a leader in the early wave of biology startups driven by rapidly improving software and experimental tools that we are calling Automated Biology. We are excited about this emerging investment theme at the intersection of cloud computing and new biology tools. Also, because Benson Hill’s platform works across all crops, it has an incredible opportunity to reinvent the business model for next-generation crop development whether the final product is genetically modified in a traditional sense, modified by new tools like CRISPR or a result of directed natural breeding.