(Chapter 1, Section 1, Statistical Engineering Handbook)
Draft 2, March 2019
Roger Hoerl
1.1.1 Objectives
The purpose of this section is to explain what statistical engineering is; that is, how
it is defined, how it works, why it is needed, as well as the basics of its underlying
theory.
1.1.2 Outline
We begin with an elucidation of the definition of statistical engineering. Next, we
explain why it is needed as a discipline, and then present the current state of the art
in terms of its underlying theory.
1.1.3 Definition and Elaboration
The discipline of statistical engineering is: the study of the systematic integration of
statistical concepts, methods, and tools, often with other relevant disciplines, to solve
important problems sustainably.
Several words in this definition warrant explanation. First of all, statistical
engineering is defined as a discipline, the study of something, not as a set of tools or
techniques. Secondly, as an engineering discipline it does not focus on advancing the
fundamental knowledge of the physical world, i.e., it is not a science. Rather, as with
other engineering disciplines, it utilizes existing concepts, methods, and tools in
novel ways to achieve novel results. In this sense it is complementary to statistical
science, just as chemical engineering is complementary to chemistry.
Concepts, methods, and tools are each important, and need to be integrated. That is,
formal statistical methods, such as time series or regression analysis, and individual
tools, such as residual plots, need to be integrated with concepts, such as the
advantages of randomization, and the need to understand the quality (“pedigree”) of
observational data prior to developing models (Hoerl and Snee 2018). When
addressing straightforward issues, a single statistical tool may suffice. However, as
noted by Hardin et al. (2015), when solving the challenging problems often faced by
practitioners, obtaining a viable solution typically requires integration of multiple
methods into an overall strategy and sequential approach.
Such integration should be done in a systematic, rather than ad hoc manner.
Throughout the history of statistics, good statisticians have generally figured out
how to integrate concepts, methods, and tools to solve problems. One classic
example would be Box and Wilson’s (1951) integration of experimental design and
regression into an overall sequential strategy for the empirical optimization of
processes, which we know today as response surface methodology.
1
,It would appear clear, however, that despite many historical examples of successful
integration, there is little existing theory in the literature on how to best accomplish
such integration in general, that is, with a new problem. Due to a lack of theory, new
integration problems are often attacked with a trial and error approach. However,
the theory of statistical engineering, discussed below, provides guidance for a
systematic approach, which is likely to be much more effective. In addition, such
theory can be formally studied, taught, and advanced over time.
By the word theory, we do not refer to mathematical statistics. Rather, we refer to
development of an overall methodology, based on the scientific method, by which
one might approach integration in a methodical (systematic) rather than ad hoc
manner. Note that theory may be defined as: “A coherent group of general
propositions used to explain a phenomenon” (Hoerl and Snee 2017). Note that
neither this nor other common definitions of theory contain explicit requirements
for mathematics, although mathematics is often important.
In addition, for many of the important problems facing practitioners, such
integration must include other disciplines beyond statistics. For example, almost by
definition, information technology (IT) is required to address “Big Data” problems
(see the ASA statement on Data Science at
(http://www.amstat.org/misc/datasciencestatement.pdf). In fact, the authors of
this handbook have found that IT is needed to some degree to solve most important
real problems. Kendall and Fulenwider (2000) explain how critical IT is to
successful Six Sigma projects, and we feel that the same is true of statistical
engineering. Challenging problems, such as developing personalized medicine
protocols through genomics, for example, are virtually impossible to resolve
without effective and innovative use of IT.
Other disciplines may be needed as well, including natural sciences, other
engineering disciplines, and also social sciences, such as organizational
effectiveness, psychology, or social networking theory, depending on the specific
problem being addressed. As one example, the improvement methodology Lean Six
Sigma (Antony et al. 2017) is essentially the integration of diverse statistical
methods, including control charts, experimental design, and regression, with
various quality concepts and methods, including Pareto charts, mistake proofing,
and quality function deployment (QFD), in addition to the efficiency concepts and
methods from Lean manufacturing. These efficiency concepts and methods could be
considered under the umbrella of the discipline of industrial engineering.
As an engineering discipline, the ultimate goal of statistical engineering is to solve
important problems. While this may seem obvious, an emphasis on solving
important problems gives statistical engineering perhaps its most important
attribute, being tool-agnostic. That is, statistical engineering is neither Bayesian nor
frequentist, neither parametric nor non-parametric (or semi-parametric), and does
not promote either classical or computer-aided designs, per se. Rather, as an
engineering discipline its “loyalty” is to solving the problem and generating results,
2
, not to a predetermined set of methods. Tools are of course important, but within a
statistical engineering paradigm they are chosen based on the unique nature of the
problem to provide the best possible solution, rather than predetermined based on
personal preferences. Various philosophies and tool sets may be employed and
integrated.
Further, statistical engineering seeks solutions that are sustainable. We argue that
many solutions, including those published in professional journals, provide
technical solutions, but all too frequently these solutions are not actually sustainable
over time. Of course, virtually no solution will be permanent, but statistical
engineering seeks solutions that are sustainable beyond the immediately time
frame, and hopefully last until the problem itself changes, or until new technology
becomes available, enabling an even better solution.
In practice, purely technical solutions often overlook organizational, political, or
psychological constraints. To be sustainable, the solution must eventually be
embedded into standard work procedures and best practices, typically via IT. An
interesting example from the related discipline of data science is the classic Netflix
competition, in which Netflix paid $1,000,000 to the team that developed the “best”
model to predict customer ratings of movies.
As noted by Donoho (2017), however, the winning solution was never actually
implemented by Netflix, because it found that the time and expense involved in
maintaining the 107 individual models utilized within the overall ensemble (see
Fung 2013) was not worth the small improvement in accuracy. So a team won the
competition and the $1,000,000 award, but it did not actually solve Netflix’s
business problem. Clearly, the technical solution is only a part of solving important
problems sustainably.
1.1.4 Why Statistical Engineering?
It is certainly logical to ask why a new discipline is actually needed, and even
allowing that one is, why it should be statistical engineering. As noted previously,
good statisticians have integrated multiple statistical methods, and tools from other
disciplines, for a long time. In this sense, we could say that statistical engineering
itself is old. However, as also noted above, such applications have typically been
presented as isolated case studies utilizing ingenuity and creativity to provide novel
solutions to complex problems. What has been missing is a concise presentation of
an underlying theory as to how the researchers actually developed their solutions. A
body of research is needed to fill in this gap, to develop an underlying theory as to
how such problems should be addressed in general, and why. In this sense, we say
that statistical engineering is a new discipline, even though statistical engineering
itself is old.
The main reason statistical engineering was needed in these case studies was to
solve problems that were not straightforward “textbook” problems. Textbook
problems are typically well structured, have a clear objective, and a single, correct
3