Thursday, June 27, 2013

Combining Information – An Empirical Bayes Analysis

In ecological studies, investigators often use existing scientific knowledge to specify hypotheses or models, and then collect data at a site of interest to test the hypotheses or fit the models. If collateral data from nearby or similar sites exist, it is common practice to use this information to make a judgmental assessment of the support for and against the model/hypothesis, but otherwise not to incorporate these collateral data into the analysis in a formal way.
For example, consider the situation in which a state agency has maintained a statewide surface water quality monitoring network, and a local community is interested in using some of these data to assess trends in selected contaminants at sites within its jurisdiction. The common practice is to use the data at each site for a site-specific trend analysis, while using data from other nearby sites only in a comparative analysis or discussion. This approach persists despite the fact that if variability in water quality at a site is high, a long record of single-site observation is required to be confident in a conclusion concerning change over time at that site.
A seemingly natural question of interest might be whether collateral data at nearby sites can contribute to the site-specific analysis other than in a comparative study. The answer often is “yes,” as a consequence of exploiting the commonality (or exchangeability) among sites. On the one hand, each field site has unique features associated with forcing functions (e.g., watershed conditions and pollutant inputs) and with response functions (e.g., water depth and hydraulic conditions). However, the environmental sciences include common principles that should lead us to expect similarity in ecosystem response to stresses, and implied in a discussion of response at other nearby sites is often an expectation that these sites have something in common with the site of interest.
As a result, it should often be possible to improve (i.e., reduce inferential error) the single-site analysis by “borrowing strength” from other similar sites. This may be accomplished using an empirical Bayes (or multilevel) approach where collateral information (which in the above example is the assessment of trends at the other similar sites) is used to construct a “prior” probability model that characterizes this information. Using Bayes Theorem, the prior probability is then combined with a probability model for the trend at the site of interest. In many instances, combining information using empirical Bayes methods yields smaller interval estimates and thus stronger inferences than would result if this information was ignored.
The strategy of “borrowing strength” from other similar analyses is an attribute shared by several statistical methods. Bayesian inference, empirical Bayes, and the classical method of random coefficients regression all have this characteristic. Bayesian inference, of course, results from the application of Bayes Theorem, which provides a logical framework for pooling information from more than one source. Empirical Bayes (EB) methods also use Bayes Theorem, but otherwise they are more classical (or frequentist) than Bayesian in that they involve estimators and consider classical properties. In the typical parametric empirical Bayes problem, we wish to simultaneously estimate parameters µ1,...,µp (e.g., p means). The EB prior for this problem is often exchangeable; that is, the prior belief for each of the i=1,...,p parameters to be estimated does not depend on the particular value of i (the prior belief is the same for each parameter). With exchangeability, the prior model is assumed to describe a simple underlying relationship among the µj, and Bayes Theorem is used to define the EB estimators for the posterior parameters.
   Exchangeability in the empirical Bayes set-up is a particularly useful concept for simultaneous parameter estimation with a system that has a hierarchical or nested structure. Examples of these systems are plentiful. For instance, cross sectional lake data may arise from individual lakes (at the lowest level of the hierarchy) that are located within ecoregions (at the next level of the hierarchy). Alternatively, individual stream stations may be nested within a stream segment or nested within a watershed. This nestedness implies a structure for the linkage of separate sites or systems that could be exploited in a hierarchical model.
Empirical Bayes descriptions and applications are less common than are Bayesian analyses in the statistics and ecology literature. While most textbooks on Bayesian inference have sections treating EB problems, they tend not to be emphasized, perhaps because they have frequentist attributes and do not require a “true” prior. See Reckhow (1993 and 1996) for ecological examples of empirical Bayes analysis.

Reckhow, K.H. 1993. A Random Coefficient Model for Chlorophyll-Nutrient Relationships in Lakes. Ecological Modelling. 70:35-50.

Reckhow, K.H. 1996. Improved Estimation of Ecological Effects Using an Empirical Bayes Method. Water Resources Bulletin. 32: 929-935.


No comments:

Post a Comment