Delivering ML Solutions
The most dangerous kind of waste is the waste we do not recognize.
Shigeo Shingo, leading expert on the Toyota Production System
Not everything that is faced can be changed, but nothing can be changed until it is faced.
James Baldwin, writer and playwright
Many individuals and organizations start their machine learning (ML) journey with high hopes,
but the lived experiences of many ML practitioners tell us that the journey of delivering ML
solutions is riddled with traps, detours, and sometimes even insurmountable barriers. When we
peel back the hype and the glamorous claims of data science being the sexiest job of the 21st
century, we often see ML practitioners bogged down by burdensome manual work; firefighting in
production; team silos; and unwieldy, brittle, and complex solutions.
This hinders, or even prevents, teams from delivering value to customers and also frustrates an
organization’s investments and ambitions in AI. As hype cycles go, many travel past the peak of
inflated expectations and crash-land into the trough of disillusionment. We might see some high-
performing ML teams move on to the plateau of productivity and wonder if we’ll ever get there.
Regardless of your background—be it academia, data science, ML engineering, product
management, software engineering, or something else—if you are building products or systems
that involve ML, you will inevitably face the challenges that we describe in this chapter. This
chapter is our attempt to distill our experience—and the experience of others—in building and
delivering ML-enabled products. We hope that these principles and practices will help you avoid
unnecessary pitfalls and find a more reliable path for your journey.
We kick off this chapter by acknowledging the dual reality of promise and disappointment in ML
in the real world. We then examine both high-level and day-to-day challenges that often cause ML
projects to fail. We then outline a better path based on the principles and practices of Lean delivery,
product thinking, and agile engineering. Finally, we briefly discuss why these practices are
relevant to, and especially to, teams delivering Generative AI products and large language model
(LLM) applications. Consider this chapter a miniature representation of the remainder of this book.
ML: Promises and Disappointments
In this section, we look at evidence of continued growth of investments and interest in ML before
taking a deep dive into the engineering, product, and delivery bottlenecks that impede the returns
on these investments.
,Continued Optimism in ML
Putting aside the hype and our individual coordinates on the hype cycle for a moment, ML
continues to be a fast-advancing field that provides many techniques for solving real-world
problems. Stanford’s “AI Index Report 2022” found that in 2021, global private investment in AI
totaled around $94 billion, which is more than double the total private investment even in 2019,
before the COVID-19 pandemic. McKinsey’s “State of AI in 2021” survey indicated that AI
adoption was continuing its steady rise: 56% of all respondents reported AI adoption in at least
one function, up from 50% in 2020.
The Stanford report also found companies are continuing to invest in applying a diverse set of ML
techniques—e.g., natural language understanding, computer vision, reinforcement learning—
across a wide array of sectors, such as healthcare, retail, manufacturing, and financial services.
From a jobs and skills perspective, Stanford’s analysis of millions of job postings since 2010
showed that the demand for ML capabilities has been growing steadily year-on-year in the past
decade, even through and after the COVID-19 pandemic.
While these trends are reassuring from an opportunities perspective, they are also highly
concerning if we journey ahead without confronting and learning from the challenges that have
ensnared us—both the producers and consumers of ML systems—in the past. Let’s take a look at
these pitfalls in detail.
Why ML Projects Fail
Despite the plethora of chart-topping Kaggle notebooks, it’s common for ML projects to fail in the
real world. Failure can come in various forms, including:
Inability to ship an ML-enabled product to production
Shipping products that customers don’t use
Deploying defective products that customers don’t trust
Inability to evolve and improve models in production quickly enough
Just to be clear—we’re not trying to avoid failures. As we all know, failure is as valuable as it is
inevitable. There’s lots that we can learn from failure. The problem arises as the cost of failure
increases—missed deadlines, unmet business outcomes, and sometimes even collateral
damage: harm to humans and loss of jobs and livelihoods of many employees who aren’t even
directly related to the ML initiative.
What we want is to fail in a low-cost and safe way, and often, so that we improve our odds of
success for everyone who has a stake in the undertaking. We also want to learn from failures—by
documenting and socializing our experiments and lessons learned, for example—so that we don’t
fail in the same way again and again. In this section, we’ll look at some common challenges—
spanning product, delivery, and engineering—that reduce our chances of succeeding, and in the
next section, we’ll explore ways to reduce the costs and likelihood of failure and deliver valuable
outcomes more effectively.
, Let’s start at a high level and then zoom in to look at day-to-day barriers to the flow of value.
High-level view: Barriers to success
Taking a high-level view—i.e., at the level of an ML project or a program of work—we’ve seen
ML projects fail to achieve their desired outcomes due to the following challenges:
Failing to solve the right problem or deliver value for users
In this failure mode, even if we have all the right engineering practices and “build the thing
right,” we fail to move the needle on the intended business outcomes because the team
didn’t “build the right thing.” This often happens when the team lacks product management
capabilities or lacks alignment with product and business. Without mature product thinking
capabilities in a team, it’s common for ML teams to overlook human-centered design
techniques—e.g., user testing, user journey mapping—to identify the pains, needs, and
desires of users.1
Challenges in productionizing models
Many ML projects do not see the light of day in production. A 2021 Gartner poll of roughly
200 business and IT professionals found that only 53% of AI projects make it from pilot
into production, and among those that succeed, it takes an average of nine months to do
so.2 The challenges of productionizing ML models isn’t limited to just compute issues such
as model deployments, but can be related to data (e.g., not having inference data available
at suitable quality, latency, and distribution in production).
Challenges after productionizing models
Once in production, it’s common to see ML practitioners bogged down by toil and tedium
that inhibits iterative experimentation and model improvements. In its “2021 Enterprise
Trends in Machine Learning” report, Algorithmia reported that 64% of companies
take more than one month to deploy a new model, an increase from 58% as reported in
Algorithmia’s 2020 report. The report also notes 38% of organizations spend more than
50% of their data scientists’ time on deployment—and that only gets worse with scale.
Long or missing feedback loops
During model development, feedback loops are often long and tedious, and this diverts
valuable time from important ML product development work. The primary way of knowing
if everything works might be to manually run a training notebook or script, wait for it to
complete—sometimes waiting hours—and manually wading through logs or printed
statements to eyeball some model metrics to determine if the model is still as good as before.
This doesn’t scale well and more often than not, we are hindered by unexpected errors and
quality degradations during development and even in production.