In the past month, I have written
several blog posts on the assessment of water quality models, beginning with “Is
Conventional Water Quality Model Verification a Charade?” to “Skill Assessment
and Statistical Hypothesis Testing for Model Evaluation.” These are intended to
serve as a backdrop for a future blog post describing and demonstrating a
Bayesian approach; this approach combines a skill assessment statistic with a
statistic for the “rigor” of the model verification exercise. To make this
approach clear, first I am going to provide a few brief introductory posts describing
Bayesian methods, beginning here with basic background on Bayesian inference.
In my view, Bayesian analysis provides a normative framework
for the use of uncertain information in decision-making and inference. From a
practical perspective, Bayes Theorem has a logical appeal in that it
characterizes a process of knowledge updating that essentially is based on
pooling precision-weighted information.
For years however, Bayesian inference was largely ignored or even discredited
in favor of frequentist (or classical) inference, which is based on easy-to-use
procedures for testing a null hypothesis, computing p-values, and calculating
confidence intervals; this was the way most of us learned to apply statistical
analysis in our work. Among the reasons behind the general rejection of
Bayesian analysis were computational difficulties and the formal use of
subjective probabilities in applications of Bayes Theorem. In recent years, new
computational approaches (e.g., Markov chain Monte Carlo) have greatly reduced
the first problem, while the general recognition of the role of expert judgment
in science has at least lessened resistance with respect to the second problem.
Beyond that, Bayesian approaches facilitate certain analyses and
interpretations that are often important to scientists.
For example, the growing recognition of the value of combining information
or “borrowing strength” in ecological studies, as new information is acquired
to augment existing knowledge, is one of several reasons why interest in
Bayesian inference continues to increase. Many currently used analytic techniques,
such as multilevel models, data assimilation, and the Kalman filter are focused
on this theme; all of these techniques reflect the basic framework of Bayes
Theorem for pooling information.
Most scientists initially are taught that probabilities represent
long-run frequencies; a consequence of this perspective is that probabilities
have no meaning in a single unique or non-replicated analysis. Scientists often
ignore this constraint and interpret probabilities to suit the particular
analysis. Related confusion sometimes arises in classical hypothesis testing
and in the interpretation of p-values. Bayesian inference provides appealing
options in those situations.
Collectively, these developments and perspectives have resulted in an increase
in the application of Bayesian approaches in ecological studies; many of these
applications involve combining information, hypothesis testing, and Bayesian
networks, for example. In sum, it seems reasonable to make the judgmental
forecast that Bayesian approaches will continue to increase in use in ecology
and the environmental sciences.
Might the science of ecology be richer and further advanced if research
were conducted within a Bayesian framework? This question may not be as absurd
as it first appears. After all, Bayes Theorem can be viewed as a logical way to
combine information or pool knowledge, so one might reasonably argue that
application of Bayes Theorem is the appropriate way to integrate new research
findings with existing knowledge. Correspondingly, failure to use Bayes Theorem
may perhaps mean that new scientific knowledge is combined with existing
knowledge in an ad hoc, judgmental way. So, over time, one might reasonably
expect that the logical, structured approach of Bayes Theorem would advance
scientific knowledge to a greater degree than would informal judgmental approaches.
Bayes Theorem lies at the heart of Bayesian inference; it is
based on the use of probability to express knowledge and the combining of
probabilities to characterize the advancement of knowledge. The simple, logical
expression of Bayes Theorem stipulates that, when combining information, the
resultant (or posterior) probability is proportional to the product of the
probability reflecting à priori knowledge (the prior probability) and the
probability representing newly acquired knowledge (the sample information, or
likelihood). Expressed more formally, Bayes Theorem states that the probability
for y conditional on experimental outcome x (written p(y|x)) is proportional to the probability of y assessed before the
experiment (written p(y)) times the
probabilistic outcome of the experiment (written p(x|y)):
To fix ideas, suppose an environmental scientist is interested in the
reduction in chlorophyll a in a lake
associated with a 30% reduction in phosphorus concentration. She could use
existing data from similar lakes to develop a simple chlorophyll – phosphorus
regression model and predict the reduction in chlorophyll for the lake of
interest. Alternatively, she could conduct dilution experiments on the lake,
collecting new data to estimate the quantity of interest. Adopting a third option,
a Bayesian scientist would use “the best of both worlds” by combining the
estimates using Bayes Theorem. In the language of Bayes Theorem, the regression
model would yield the prior probability, since this estimator exists prior to the collection of new data, and the
posterior probability would represent the revised estimate based on both prior
knowledge and new experimental evidence.
At first glance, it seems hard to argue against this seemingly rational
quantitative strategy for updating scientific knowledge. Indeed, one might ask
why all ecologists and environmental scientists aren’t Bayesians? There are a
number of reasons. Certainly, the most important is that virtually all
scientists still learn probability and statistics from a classical, or
frequentist, perspective, and Bayes Theorem is at best a minor topic within
that curriculum. Beyond that, Bayesian inference has been widely regarded as
subjective and thus not suitable for objective scientific analysis. The problem
with this perspective is that most science is hardly the objective pursuit that
many choose to believe.
Consider the judgments we make in a scientific analysis. Implicit (or
explicit) in the scientist’s lake study on phosphorus and chlorophyll are
judgments about the adequacy of the existing lakes data, the merits of dilution
experiments, and the truth of the model relating phosphorus to chlorophyll.
There are no purely scientific, absolutely correct, choices here; these
represent “gray areas” about which reasonable scientists would disagree. Yet,
these also represent judgments that must be
made by the scientist in order to carry out the lake study. Ordinary scientists
are not unique in their reliance on judgment, however. Indeed, the work of some
of the most distinguished scientists (e.g., Galileo, Newton, Darwin, and
Einstein) in history has been shown to involve a substantial role for subjectivity
in their scientific contributions.
Further, consider how most scientists address the revision of scientific
knowledge in light of their own new contributions. In some cases, the scientist
simply states the conclusions of his work, not attempting to quantitatively integrate
new findings with existing knowledge. When integration is attempted in the
concluding section of a research paper, it is typically a descriptive subjective
assessment of the implication of the new knowledge. Bayesian inference has the
potential to make combining evidence more analytically rigorous. It is ironic
that the subjectivity of Bayesian analysis would be its undoing.
In future blog postings, I plan to discuss and demonstrate empirical Bayes
analysis, Bayesian hypothesis testing, and Bayes networks.
No comments:
Post a Comment