Ken Reckhow's Water Quality Wire: Bayesian Inference in Ecology and Environmental Science: An Introduction

In the past month, I have written several blog posts on the assessment of water quality models, beginning with “Is Conventional Water Quality Model Verification a Charade?” to “Skill Assessment and Statistical Hypothesis Testing for Model Evaluation.” These are intended to serve as a backdrop for a future blog post describing and demonstrating a Bayesian approach; this approach combines a skill assessment statistic with a statistic for the “rigor” of the model verification exercise. To make this approach clear, first I am going to provide a few brief introductory posts describing Bayesian methods, beginning here with basic background on Bayesian inference.

In my view, Bayesian analysis provides a normative framework for the use of uncertain information in decision-making and inference. From a practical perspective, Bayes Theorem has a logical appeal in that it characterizes a process of knowledge updating that essentially is based on pooling precision-weighted information.

For years however, Bayesian inference was largely ignored or even discredited in favor of frequentist (or classical) inference, which is based on easy-to-use procedures for testing a null hypothesis, computing p-values, and calculating confidence intervals; this was the way most of us learned to apply statistical analysis in our work. Among the reasons behind the general rejection of Bayesian analysis were computational difficulties and the formal use of subjective probabilities in applications of Bayes Theorem. In recent years, new computational approaches (e.g., Markov chain Monte Carlo) have greatly reduced the first problem, while the general recognition of the role of expert judgment in science has at least lessened resistance with respect to the second problem. Beyond that, Bayesian approaches facilitate certain analyses and interpretations that are often important to scientists.

For example, the growing recognition of the value of combining information or “borrowing strength” in ecological studies, as new information is acquired to augment existing knowledge, is one of several reasons why interest in Bayesian inference continues to increase. Many currently used analytic techniques, such as multilevel models, data assimilation, and the Kalman filter are focused on this theme; all of these techniques reflect the basic framework of Bayes Theorem for pooling information.

Most scientists initially are taught that probabilities represent long-run frequencies; a consequence of this perspective is that probabilities have no meaning in a single unique or non-replicated analysis. Scientists often ignore this constraint and interpret probabilities to suit the particular analysis. Related confusion sometimes arises in classical hypothesis testing and in the interpretation of p-values. Bayesian inference provides appealing options in those situations.

Collectively, these developments and perspectives have resulted in an increase in the application of Bayesian approaches in ecological studies; many of these applications involve combining information, hypothesis testing, and Bayesian networks, for example. In sum, it seems reasonable to make the judgmental forecast that Bayesian approaches will continue to increase in use in ecology and the environmental sciences.

Might the science of ecology be richer and further advanced if research were conducted within a Bayesian framework? This question may not be as absurd as it first appears. After all, Bayes Theorem can be viewed as a logical way to combine information or pool knowledge, so one might reasonably argue that application of Bayes Theorem is the appropriate way to integrate new research findings with existing knowledge. Correspondingly, failure to use Bayes Theorem may perhaps mean that new scientific knowledge is combined with existing knowledge in an ad hoc, judgmental way. So, over time, one might reasonably expect that the logical, structured approach of Bayes Theorem would advance scientific knowledge to a greater degree than would informal judgmental approaches.

Bayes Theorem lies at the heart of Bayesian inference; it is based on the use of probability to express knowledge and the combining of probabilities to characterize the advancement of knowledge. The simple, logical expression of Bayes Theorem stipulates that, when combining information, the resultant (or posterior) probability is proportional to the product of the probability reflecting à priori knowledge (the prior probability) and the probability representing newly acquired knowledge (the sample information, or likelihood). Expressed more formally, Bayes Theorem states that the probability for y conditional on experimental outcome x (written p(y|x)) is proportional to the probability of y assessed before the experiment (written p(y)) times the probabilistic outcome of the experiment (written p(x|y)):

To fix ideas, suppose an environmental scientist is interested in the reduction in chlorophyll a in a lake associated with a 30% reduction in phosphorus concentration. She could use existing data from similar lakes to develop a simple chlorophyll – phosphorus regression model and predict the reduction in chlorophyll for the lake of interest. Alternatively, she could conduct dilution experiments on the lake, collecting new data to estimate the quantity of interest. Adopting a third option, a Bayesian scientist would use “the best of both worlds” by combining the estimates using Bayes Theorem. In the language of Bayes Theorem, the regression model would yield the prior probability, since this estimator exists prior to the collection of new data, and the posterior probability would represent the revised estimate based on both prior knowledge and new experimental evidence.

At first glance, it seems hard to argue against this seemingly rational quantitative strategy for updating scientific knowledge. Indeed, one might ask why all ecologists and environmental scientists aren’t Bayesians? There are a number of reasons. Certainly, the most important is that virtually all scientists still learn probability and statistics from a classical, or frequentist, perspective, and Bayes Theorem is at best a minor topic within that curriculum. Beyond that, Bayesian inference has been widely regarded as subjective and thus not suitable for objective scientific analysis. The problem with this perspective is that most science is hardly the objective pursuit that many choose to believe.

Consider the judgments we make in a scientific analysis. Implicit (or explicit) in the scientist’s lake study on phosphorus and chlorophyll are judgments about the adequacy of the existing lakes data, the merits of dilution experiments, and the truth of the model relating phosphorus to chlorophyll. There are no purely scientific, absolutely correct, choices here; these represent “gray areas” about which reasonable scientists would disagree. Yet, these also represent judgments that must be made by the scientist in order to carry out the lake study. Ordinary scientists are not unique in their reliance on judgment, however. Indeed, the work of some of the most distinguished scientists (e.g., Galileo, Newton, Darwin, and Einstein) in history has been shown to involve a substantial role for subjectivity in their scientific contributions.

Further, consider how most scientists address the revision of scientific knowledge in light of their own new contributions. In some cases, the scientist simply states the conclusions of his work, not attempting to quantitatively integrate new findings with existing knowledge. When integration is attempted in the concluding section of a research paper, it is typically a descriptive subjective assessment of the implication of the new knowledge. Bayesian inference has the potential to make combining evidence more analytically rigorous. It is ironic that the subjectivity of Bayesian analysis would be its undoing.

In future blog postings, I plan to discuss and demonstrate empirical Bayes analysis, Bayesian hypothesis testing, and Bayes networks.

Ken Reckhow's Water Quality Wire

Monday, June 24, 2013

Bayesian Inference in Ecology and Environmental Science: An Introduction

No comments:

Post a Comment