Chapter 1. Machine Learning
Machine learning expands the boundaries of what’s possible by allowing computers to solve
problems that were intractable just a few short years ago. From fraud detection and medical
diagnoses to product recommendations and cars that “see” what’s in front of them, machine
learning impacts our lives every day. As you read this, scientists are using machine learning to
unlock the secrets of the human genome. When we one day cure cancer, we will thank machine
learning for making it possible.
Machine learning is revolutionary because it provides an alternative to algorithmic problem-
solving. Given a recipe, or algorithm, it’s not difficult to write an app that hashes a password or
computes a monthly mortgage payment. You code up the algorithm, feed it input, and receive
output in return. It’s another proposition altogether to write code that determines whether a photo
contains a cat or a dog. You can try to do it algorithmically, but the minute you get it working,
you’ll come across a cat or dog picture that breaks the algorithm.
Machine learning takes a different approach to turning input into output. Rather than relying on
you to implement an algorithm, it examines a dataset of inputs and outputs and learns how to
generate output of its own in a process known as training. Under the hood, special algorithms
called learning algorithms fit mathematical models to the data and codify the relationship between
data going in and data coming out. Once trained, a model can accept new inputs and generate
outputs consistent with the ones in the training data.
To use machine learning to distinguish between cats and dogs, you don’t code a cat-versus-dog
algorithm. Instead, you train a machine learning model with cat and dog photos. Success depends
on the learning algorithm used and the quality and volume of the training data.
Part of becoming a machine learning engineer is familiarizing yourself with the various learning
algorithms and developing an intuition for when to use one versus another. That intuition comes
from experience and from an understanding of how machine learning fits mathematical models to
data. This chapter represents the first step on that journey. It begins with an overview of machine
learning and the most common types of machine learning models, and it concludes by introducing
two popular learning algorithms and using them to build simple yet fully functional models.
What Is Machine Learning?
At an existential level, machine learning (ML) is a means for finding patterns in numbers and
exploiting those patterns to make predictions. ML makes it possible to train a model with rows or
sequences of 1s and 0s, and to learn from the data so that, given a new sequence, the model can
predict what the result will be. Learning is the process by which ML finds patterns that can be
used to predict future outputs, and it’s where the “learning” in “machine learning” comes from.
,As an example, consider the table of 1s and 0s depicted in Figure 1-1. Each number in the fourth
column is somehow based on the three numbers preceding it in the same row. What’s the missing
number?
Figure 1-1. Simple dataset consisting of 0s and 1s
One possible solution is that for a given row, if the first three columns contain more 0s than 1s,
then the fourth contains a 0. If the first three columns contain more 1s than 0s, then the answer is
1. By this logic, the empty box should contain a 1. Data scientists refer to the column containing
answers (the red column in the figure) as the label column. The remaining columns are feature
columns. The goal of a predictive model is to find patterns in the rows in the feature columns that
allow it to predict what the label will be.
If all datasets were this simple, you wouldn’t need machine learning. But real-world datasets are
larger and more complex. What if the dataset contained millions of rows and thousands of columns,
which, as it happens, is common in machine learning? For that matter, what if the dataset
resembled the one in Figure 1-2?
, Figure 1-2. A more complex dataset
It’s difficult for any human to examine this dataset and come up with a set of rules for predicting
whether the red box should contain a 0 or a 1. (And no, it’s not as simple as counting 1s and 0s.)
Just imagine how much more difficult it would be if the dataset really did have millions of rows
and thousands of columns.
That’s what machine learning is all about: finding patterns in massive datasets of numbers. It
doesn’t matter whether there are 100 rows or 1,000,000 rows. In many cases, more is better,
because 100 rows might not provide enough samples for patterns to be discerned.
It isn’t an oversimplification to say that machine learning solves problems by mathematically
modeling patterns in sets of numbers. Most any problem can be reduced to a set of numbers. For
example, one of the common applications for ML today is sentiment analysis: looking at a text
sample such as a movie review or a comment left on a website and assigning it a 0 for negative
sentiment (for example, “The food was bland and the service was terrible.”) or a 1 for positive
sentiment (“Excellent food and service. Can’t wait to visit again!”). Some reviews might be
mixed—for example, “The burger was great but the fries were soggy”—so we use
the probability that the label is a 1 as a sentiment score. A very negative comment might score a
0.1, while a very positive comment might score a 0.9, as in there’s a 90% chance that it expresses
positive sentiment.
Sentiment analyzers and other models that work with text are frequently trained on datasets like
the one in Figure 1-3, which contains one row for every text sample and one column for every
word in the corpus of text (all the words in the dataset). A typical dataset like this one might contain
millions of rows and 20,000 or more columns. Each row contains a 0 for negative sentiment in the
label column, or a 1 for positive sentiment. Within each row are word counts—the number of times
a given word appears in an individual sample. The dataset is sparse, meaning it is mostly 0s with
an occasional nonzero number sprinkled in. But machine learning doesn’t care about the makeup
of the numbers. If there are patterns that can be exploited to determine whether the next sample
expresses positive or negative sentiment, it will find them. Spam filters use datasets such as these
with 1s and 0s in the label column denoting spam and nonspam messages. This allows modern
spam filters to achieve an astonishing degree of accuracy. Moreover, these models grow smarter
over time as they are trained with more and more emails.