Thursday, June 27, 2013

Combining Information – An Empirical Bayes Analysis

In ecological studies, investigators often use existing scientific knowledge to specify hypotheses or models, and then collect data at a site of interest to test the hypotheses or fit the models. If collateral data from nearby or similar sites exist, it is common practice to use this information to make a judgmental assessment of the support for and against the model/hypothesis, but otherwise not to incorporate these collateral data into the analysis in a formal way.
For example, consider the situation in which a state agency has maintained a statewide surface water quality monitoring network, and a local community is interested in using some of these data to assess trends in selected contaminants at sites within its jurisdiction. The common practice is to use the data at each site for a site-specific trend analysis, while using data from other nearby sites only in a comparative analysis or discussion. This approach persists despite the fact that if variability in water quality at a site is high, a long record of single-site observation is required to be confident in a conclusion concerning change over time at that site.
A seemingly natural question of interest might be whether collateral data at nearby sites can contribute to the site-specific analysis other than in a comparative study. The answer often is “yes,” as a consequence of exploiting the commonality (or exchangeability) among sites. On the one hand, each field site has unique features associated with forcing functions (e.g., watershed conditions and pollutant inputs) and with response functions (e.g., water depth and hydraulic conditions). However, the environmental sciences include common principles that should lead us to expect similarity in ecosystem response to stresses, and implied in a discussion of response at other nearby sites is often an expectation that these sites have something in common with the site of interest.
As a result, it should often be possible to improve (i.e., reduce inferential error) the single-site analysis by “borrowing strength” from other similar sites. This may be accomplished using an empirical Bayes (or multilevel) approach where collateral information (which in the above example is the assessment of trends at the other similar sites) is used to construct a “prior” probability model that characterizes this information. Using Bayes Theorem, the prior probability is then combined with a probability model for the trend at the site of interest. In many instances, combining information using empirical Bayes methods yields smaller interval estimates and thus stronger inferences than would result if this information was ignored.
The strategy of “borrowing strength” from other similar analyses is an attribute shared by several statistical methods. Bayesian inference, empirical Bayes, and the classical method of random coefficients regression all have this characteristic. Bayesian inference, of course, results from the application of Bayes Theorem, which provides a logical framework for pooling information from more than one source. Empirical Bayes (EB) methods also use Bayes Theorem, but otherwise they are more classical (or frequentist) than Bayesian in that they involve estimators and consider classical properties. In the typical parametric empirical Bayes problem, we wish to simultaneously estimate parameters µ1,...,µp (e.g., p means). The EB prior for this problem is often exchangeable; that is, the prior belief for each of the i=1,...,p parameters to be estimated does not depend on the particular value of i (the prior belief is the same for each parameter). With exchangeability, the prior model is assumed to describe a simple underlying relationship among the µj, and Bayes Theorem is used to define the EB estimators for the posterior parameters.
   Exchangeability in the empirical Bayes set-up is a particularly useful concept for simultaneous parameter estimation with a system that has a hierarchical or nested structure. Examples of these systems are plentiful. For instance, cross sectional lake data may arise from individual lakes (at the lowest level of the hierarchy) that are located within ecoregions (at the next level of the hierarchy). Alternatively, individual stream stations may be nested within a stream segment or nested within a watershed. This nestedness implies a structure for the linkage of separate sites or systems that could be exploited in a hierarchical model.
Empirical Bayes descriptions and applications are less common than are Bayesian analyses in the statistics and ecology literature. While most textbooks on Bayesian inference have sections treating EB problems, they tend not to be emphasized, perhaps because they have frequentist attributes and do not require a “true” prior. See Reckhow (1993 and 1996) for ecological examples of empirical Bayes analysis.

Reckhow, K.H. 1993. A Random Coefficient Model for Chlorophyll-Nutrient Relationships in Lakes. Ecological Modelling. 70:35-50.

Reckhow, K.H. 1996. Improved Estimation of Ecological Effects Using an Empirical Bayes Method. Water Resources Bulletin. 32: 929-935.


Monday, June 24, 2013

Bayesian Inference in Ecology and Environmental Science: An Introduction

In the past month, I have written several blog posts on the assessment of water quality models, beginning with “Is Conventional Water Quality Model Verification a Charade?” to “Skill Assessment and Statistical Hypothesis Testing for Model Evaluation.” These are intended to serve as a backdrop for a future blog post describing and demonstrating a Bayesian approach; this approach combines a skill assessment statistic with a statistic for the “rigor” of the model verification exercise. To make this approach clear, first I am going to provide a few brief introductory posts describing Bayesian methods, beginning here with basic background on Bayesian inference.

In my view, Bayesian analysis provides a normative framework for the use of uncertain information in decision-making and inference. From a practical perspective, Bayes Theorem has a logical appeal in that it characterizes a process of knowledge updating that essentially is based on pooling precision-weighted information.

For years however, Bayesian inference was largely ignored or even discredited in favor of frequentist (or classical) inference, which is based on easy-to-use procedures for testing a null hypothesis, computing p-values, and calculating confidence intervals; this was the way most of us learned to apply statistical analysis in our work. Among the reasons behind the general rejection of Bayesian analysis were computational difficulties and the formal use of subjective probabilities in applications of Bayes Theorem. In recent years, new computational approaches (e.g., Markov chain Monte Carlo) have greatly reduced the first problem, while the general recognition of the role of expert judgment in science has at least lessened resistance with respect to the second problem. Beyond that, Bayesian approaches facilitate certain analyses and interpretations that are often important to scientists.

For example, the growing recognition of the value of combining information or “borrowing strength” in ecological studies, as new information is acquired to augment existing knowledge, is one of several reasons why interest in Bayesian inference continues to increase. Many currently used analytic techniques, such as multilevel models, data assimilation, and the Kalman filter are focused on this theme; all of these techniques reflect the basic framework of Bayes Theorem for pooling information.

Most scientists initially are taught that probabilities represent long-run frequencies; a consequence of this perspective is that probabilities have no meaning in a single unique or non-replicated analysis. Scientists often ignore this constraint and interpret probabilities to suit the particular analysis. Related confusion sometimes arises in classical hypothesis testing and in the interpretation of p-values. Bayesian inference provides appealing options in those situations.

Collectively, these developments and perspectives have resulted in an increase in the application of Bayesian approaches in ecological studies; many of these applications involve combining information, hypothesis testing, and Bayesian networks, for example. In sum, it seems reasonable to make the judgmental forecast that Bayesian approaches will continue to increase in use in ecology and the environmental sciences.
Might the science of ecology be richer and further advanced if research were conducted within a Bayesian framework? This question may not be as absurd as it first appears. After all, Bayes Theorem can be viewed as a logical way to combine information or pool knowledge, so one might reasonably argue that application of Bayes Theorem is the appropriate way to integrate new research findings with existing knowledge. Correspondingly, failure to use Bayes Theorem may perhaps mean that new scientific knowledge is combined with existing knowledge in an ad hoc, judgmental way. So, over time, one might reasonably expect that the logical, structured approach of Bayes Theorem would advance scientific knowledge to a greater degree than would informal judgmental approaches.

Bayes Theorem lies at the heart of Bayesian inference; it is based on the use of probability to express knowledge and the combining of probabilities to characterize the advancement of knowledge. The simple, logical expression of Bayes Theorem stipulates that, when combining information, the resultant (or posterior) probability is proportional to the product of the probability reflecting à priori knowledge (the prior probability) and the probability representing newly acquired knowledge (the sample information, or likelihood). Expressed more formally, Bayes Theorem states that the probability for y conditional on experimental outcome x (written p(y|x)) is proportional to the probability of y assessed before the experiment (written p(y)) times the probabilistic outcome of the experiment (written p(x|y)):


To fix ideas, suppose an environmental scientist is interested in the reduction in chlorophyll a in a lake associated with a 30% reduction in phosphorus concentration. She could use existing data from similar lakes to develop a simple chlorophyll – phosphorus regression model and predict the reduction in chlorophyll for the lake of interest. Alternatively, she could conduct dilution experiments on the lake, collecting new data to estimate the quantity of interest. Adopting a third option, a Bayesian scientist would use “the best of both worlds” by combining the estimates using Bayes Theorem. In the language of Bayes Theorem, the regression model would yield the prior probability, since this estimator exists prior to the collection of new data, and the posterior probability would represent the revised estimate based on both prior knowledge and new experimental evidence.

At first glance, it seems hard to argue against this seemingly rational quantitative strategy for updating scientific knowledge. Indeed, one might ask why all ecologists and environmental scientists aren’t Bayesians? There are a number of reasons. Certainly, the most important is that virtually all scientists still learn probability and statistics from a classical, or frequentist, perspective, and Bayes Theorem is at best a minor topic within that curriculum. Beyond that, Bayesian inference has been widely regarded as subjective and thus not suitable for objective scientific analysis. The problem with this perspective is that most science is hardly the objective pursuit that many choose to believe.

Consider the judgments we make in a scientific analysis. Implicit (or explicit) in the scientist’s lake study on phosphorus and chlorophyll are judgments about the adequacy of the existing lakes data, the merits of dilution experiments, and the truth of the model relating phosphorus to chlorophyll. There are no purely scientific, absolutely correct, choices here; these represent “gray areas” about which reasonable scientists would disagree. Yet, these also represent judgments that must be made by the scientist in order to carry out the lake study. Ordinary scientists are not unique in their reliance on judgment, however. Indeed, the work of some of the most distinguished scientists (e.g., Galileo, Newton, Darwin, and Einstein) in history has been shown to involve a substantial role for subjectivity in their scientific contributions.

Further, consider how most scientists address the revision of scientific knowledge in light of their own new contributions. In some cases, the scientist simply states the conclusions of his work, not attempting to quantitatively integrate new findings with existing knowledge. When integration is attempted in the concluding section of a research paper, it is typically a descriptive subjective assessment of the implication of the new knowledge. Bayesian inference has the potential to make combining evidence more analytically rigorous. It is ironic that the subjectivity of Bayesian analysis would be its undoing.


In future blog postings, I plan to discuss and demonstrate empirical Bayes analysis, Bayesian hypothesis testing, and Bayes networks.

Friday, June 21, 2013

Rules are Key to Cleaner Jordan and Falls Lakes

This was printed today in the Raleigh News & Observer as an Op-Ed piece. This is our scientific perspective on a contentious debate currently occurring in the North Carolina legislature. Oversimplifying but still reasonably accurate, this debate pits an upstream pro-development community against a downstream environmental community. From a science and regulatory perspective, it raises issues concerning watershed pollutant reduction activities and/or in-lake treatment techniques to attain water quality goals. It also implicitly addresses the TMDL issue of “pollutants” (e.g., phosphorus and nitrogen) versus “pollution” (e.g., actions within a waterbody to improve water quality).


Kenneth H. Reckhow, Professor Emeritus, Nicholas School of the Environment, Duke University
Michael D. Aitken, Professor and Chair, Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina at Chapel Hill

Even at the planning stages for the two large Corps of Engineers reservoirs (Jordan Lake and Falls Lake) in the Triangle area, water scientists in North Carolina understood that they would have great difficulty complying with water quality standards. This has proven to be true, as both reservoirs are in violation of the North Carolina water quality standard for chlorophyll a and are now subject to substantial reductions in nutrient (phosphorus and nitrogen) loading. Achieving compliance with this water quality standard is estimated to cost hundreds of millions of dollars, so it is understandable that questions are being raised about the wisdom and effectiveness of costly nutrient control measures.

All states are mandated by the federal government to set water quality standards for their surface water bodies, which North Carolina has done, and they conduct water quality monitoring programs to assess compliance with standards. If a water body is found to be out of compliance, which is the case for both Jordan and Falls Lakes, then the state must develop a plan (often in the form of a Total Maximum Daily Load, or TMDL) that describes what management actions will be implemented to achieve compliance. These TMDL plans have been developed for Jordan and Falls Lakes and are in the early stages of implementation by the watershed jurisdictions.

It has recently been suggested that the nutrient management actions implemented in the Jordan Lake watershed are having no discernible impact on Jordan Lake. In the minds of some individuals, this conclusion leads to the recommendation to abandon the Jordan Lake Rules (the agreed-upon plan to implement nutrient control measures in the watershed) and instead implement in-lake technologies to improve water quality in Jordan Lake. There are at least two fundamental errors in these perspectives.

First, even if all required nutrient management actions were implemented on the day that the Jordan Rules were approved, it would take years for a large water body like Jordan Lake to respond to reductions in pollutant inputs. The actual implementation of pollutant controls, which includes time to put management actions in place, requires additional time for the impact of these actions to be observable in the lake. One of us (Reckhow) recently chaired the National Academy of Sciences review of the Chesapeake Bay water quality program. A point emphasized in this review was that the public needs to be patient concerning the time required for improvements in water quality in the Bay. For example, people should not expect that implementation of stormwater detention basins in their neighborhoods would quickly result in observable water quality improvements in the Bay. We cautioned that several years, even a decade, might be necessary to see the improvements. However, we also recognized that if the jurisdictions in the Chesapeake watershed maintained their nutrient management implementation schedule, ultimately they would see the improvements that they expected.

Second, there are no technologies, new or old, that can provide any meaningful level of cleanup of algae in a lake the size of Jordan. In-lake techniques to improve water quality, such as harvesting, aeration, and algaecides, can be effective in ponds (or, at best, in small lakes), but they are not intended for large waterbodies. Further, there is no evidence that these techniques will result in compliance with the chlorophyll a criterion in Jordan Lake.

We are both engineers with great faith that many of today’s problems can be solved through technology. But we do not subscribe to blind faith in technology to correct preventable problems. A first principle of responsible stewardship for any public water supply is to protect the watershed. There are viable, proven means of reducing inputs of nutrients and other pollutants to reservoirs that are embodied in the Jordan and Falls Lake Rules. These inputs will only increase as the affected communities and municipalities grow if appropriate watershed management plans are not in place. And, standard engineering practice before implementing any proposed new technology is to evaluate it thoroughly, in context and at a relevant scale, especially when the consequences of failure have implications for public health.


The nutrient management plans for Falls and Jordan Lakes have been based on sound scientific analysis and active stakeholder involvement. For Falls Lake, to further ensure that a deliberative process continues, a careful re-examination and refinement of the modeling and assessment that resulted in the Falls Lake Rules is in progress. In our view, this adaptive management approach is a far more prudent strategy for effective management of lakes than is the hasty and ill-conceived abandonment of the Jordan Lake Rules that is currently under consideration by the General Assembly.

Thursday, June 13, 2013

Skill Assessment and Statistical Hypothesis Testing for Model Evaluation

“Skill assessment” for water quality models refers to the results of a set of statistical and graphical techniques to quantify the goodness-of-fit for a water quality model. These techniques are applied to compare observations with model predictions; Stow et al. (2009) list and describes statistics such as the correlation coefficient, root mean square error, and average absolute error as skill assessment options for univariate comparisons.

In earlier work that preceded the “skill assessment” designation, Reckhow et al. (1990) proposed that a statistical test be used with observations and predictions to effectively serve as a model verification hypothesis test. Since statistical hypothesis tests are typically set up with the hope/goal of rejection of the null hypothesis, Reckhow et al. proposed that the test be structured so that rejection of the stated null hypothesis is indicative of a verified model, given a pre-specified acceptable error level. For example, consider the null hypothesis H0 where the true mean of the absolute values of the prediction error is 2 mg/L. And consider the alternative hypothesis H1 where the true mean of the absolute values of the prediction error is less than 2 mg/L. This is a one-sided test in that the rejection region and H1 are on only one side (less than). The hypotheses can be tested in the conventional manner, with rejection of H0 (and acceptance of H1) as the result indicating successful model verification. When the null hypothesis is true, the sampling distribution of the test statistic is centered on 2 mg/L, and the rejection region is located in the left tail of the distribution only. To test model goodness-of-fit with hypotheses assuming this structure, the model user must select an acceptable error level. In the example given here, the acceptable error level corresponds to 2 mg/L. In the paper, Reckhow et al. described applications of the chi square test, t-test, Kolmogorov-Smirnov test, regression analysis and the Wilcoxon test using this approach.
In a previous blog post (“Is Conventional Water Quality Modeling a Charade?” posted on April 30th) I suggested that in most cases the data set aside for verification are not that different from the calibration data. To make users aware of that fact, I proposed a statistic for model verification rigor based on the differences between the calibration and verification data. Ultimately, I think that a verification rigor statistic should be combined with the skill assessment statistics discussed above for an improved assessment of the confidence that a model user should have in model applications. I plan to address that approach in an upcoming blog post.
Reckhow, K.H., J.T. Clements, and R.C. Dodd. 1990. Statistical evaluation of mechanistic water quality models. Journal of Environmental Engineering. 116:250-268.

Stow, C.A., J. Jolliff, D.J. McGillicuddy, S.C. Doney, J.I. Allen, M.A.M. Friedrichs, K.A. Rose, and P. Wallhead. 2009. Skill assessment for coupled biological/physical models of marine systems. Journal of Marine Systems 76:4-15.

Monday, June 10, 2013

An Assessment of Techniques for Error Propagation (Uncertainty Analysis) in Water Quality Models

Error propagation is an important but under-utilized uncertainty analysis technique that allows a modeler to estimate the impact of errors from all uncertain factors (e.g., parameters, inputs, initial conditions, boundary conditions, model equations) on the model response(s). The two commonly used error propagation techniques traditionally have been first-order error analysis and Monte Carlo simulation. A related approach, sensitivity analysis, allows the modeler to assess quantitatively the impact of a subset of model terms (often just one) on model response.

First-order error analysis is based on the approximation of the randomness in one variable (e.g., a reaction rate) with the first nonzero central moment, and the characterization of a functional relationship with the first-order terms of the Taylor series. This means that error in an input variable (e.g., x) is assumed to be fully characterized by the variance and that this error is converted to error in the endogenous variable (e.g., y) through a linearization of the equation.  The usefulness of first-­order error analysis is a function of the validity of these approximations.

Consider a simple functional relationship (equation 1):


If this relationship is reasonably “well behaved” (e.g., not highly nonlinear) and if the standard deviation of x is not too large, then (equation 2):
 
where E is the expectation operator. (The expectation or expected value is the probabilistic average of a random variable. Under random sampling, the expected value of a variable is its mean.) Likewise, under the same assumptions, a Taylor series expansion of f(x) may be used to approximate the variance (equation 3):




Employing only the first two terms of the Taylor series and taking the expansion about the mean, , equation 3 becomes (equation 4):


Taking the variance of equation 4 and noting that variance f() equals zero, this equation is transformed to the bivariate form of the error propagation equation (equation 5):


where s is the sample standard deviation and s2 is the sample variance.

For a multivariate relationship, there is a straight­forward extension of equation 5, taking into consideration the covariation between predictor variables (equation 6):


Equation 6 shows that the error in a model prediction due to errors in the variables is a function of the individual variable error (sxi), a sensitivity factor (df/dx), expressing the "importance" of each variable, and the correlation (pamong variables. This relationship is sufficiently general so that "parameters" may be substituted for "variables" in the previous sentence with no change in meaning.

Equation 6 is the error propagation equation that is the basis of first-order analysis.  The method receives its name from the fact that only the first-order, or linear, terms of the Taylor series are retained. The degree to which this approximation is successful may be assessed with the aid of figure 1 below. 
 

Figure 1.  First order estimation

The model relating x to f(x) in figure 1 is assumed to be nonlinear for the sake of generality, and the straight line tangent to this model is the first-order approximation. First-order error analysis then is graphically portrayed by the dashed lines which convert the error in x to an error in f(x). The success of this error approximation method is determined by the following:
a.       The degree of nonlinearity in the model. As the model becomes increasingly nonlinear, f'(x )∆x becomes less accurate as a measure of ∆f. This means that highly nonlinear models may not be amenable to first-order treatment of error.
b.      The size of the error term. For a nonlinear model, the accuracy of the first-order error estimate is a function of the error in x (represented by ∆x in figure 1). Small errors in x coupled with near linearity for the model are favorable conditions for effective application of first-order analysis.
c.       The acceptable level of error (due to inaccuracy in the error analysis) for the issue under study.
d.      The extent to which the distribution of errors is represented by the mean and standard deviation of the distribution. For complex or skewed error distributions, the mean and deviation may be inadequate, leading to a faulty estimate of error in f(x).

Low-cost, fast computing that supports Monte Carlo simulation has largely made applications of first-order error analysis relatively rare in recent years. Monte Carlo simulation is a conceptually simple alternative to first-order error analysis; it was so-named because it shares characteristics of randomness with gambling casinos. Under this technique, probability density functions are assigned to each characteristic (e.g., variable or parameter), reflecting the uncertainty in that characteristic. Values are then randomly chosen from each probability distribution. These values are inserted into the model, and a prediction is calculated. After this is repeated a large number (several hundred to several thousand) of times, an empirical distribution of predicted model response develops, which reflects the combined uncertainties “flowing” through the model.

As an example, consider the simple model (equation 7):


Figure 2 displays the error distributions for the uncertain parameters (β1 and β2) and for the uncertain model equation (ε). At each step in the Monte Carlo simulation, each distribution is randomly sampled to yield a single value that is inserted into equation 7, and a value for the response variable y is calculated.  After several hundred (or several thousand) runs of the model, the predicted responses, y


can be tabulated or plotted in a histogram; this histogram reflects the errors and the model structure.
If the parameters β1 and β2 are correlated (this is not uncommon in water quality models), then individual sampling steps in the Monte Carlo procedure cannot be undertaken independently. Instead, the sampling of values from the correlated probability distributions must be undertaken sequentially, with the probability distribution of the second parameter (either parameter may be selected first or second) conditional on the value of the first parameter selected; this “conditionality” reflects the correlation between the two parameters.
An essential condition for success of Monte Carlo simulation for error propagation with water quality models is that the error terms and the parameter covariances need to be estimated. Estimation of the parameter errors and covariances is possible with a statistical (e.g., regression) model, but may be difficult to impossible for large water quality models with many parameters, as the available data often do not contain sufficient information to estimate parameter errors and covariances. Note that variances and covariances among measured water quality variables (e.g., the “x” in equation 7) are not the same as the variances and covariances among the model parameters (β1 and β2). For example, a model parameter may be “phytoplankton settling velocity” in a lake, which is typically not measured; a variable may be phytoplankton density, which is often measured (as chlorophyll a).  With commonly-measured water quality data, it may not be possible to estimate parameter errors and covariances.  Techniques presented below can partially address this conundrum.

Among experienced water modelers, it is understood (but generally not acknowledged) that many ‘‘sets’’ of parameter values will fit a model about equally well; in other words, similar predictions can be obtained by simultaneously manipulating several parameter values in concert. This is plausible in part because all models are approximations of the real world, and because most model parameters represent aggregate or “effective” processes (spatially and temporally averaged at some scale) and are unlikely to be represented by a fixed constant across scales. Additionally, many mathematical structures produce extreme correlation between model parameters, even when a model is over-determined. This condition, called ‘‘equifinality,’’ is well-documented in the hydrologic sciences, but the concept has rarely been discussed in the water quality sciences. I believe that the recognition of equifinality should change the perspective of water quality modelers from seeking a single ‘‘optimal’’ value for each model parameter, to seeking a distribution of parameter sets that all meet a predefined fitting criterion. These acceptable parameter sets may then provide the basis for estimating model prediction error associated with the model parameters.

The development of methods for identifying plausible parameter sets for large multi-parameter environmental models with limited observational data is best understood through the regionalized (or generalized) sensitivity analysis (RSA). RSA is a Monte Carlo sampling approach to assess model parameter sensitivity; this method was initially proposed as a means to prioritize future sampling and experimentation for model and parameter improvements. Regionalized sensitivity analysis is simple in concept, and is a useful way to use limited information to bound model parameter distributions. Given a particular model and a system (e.g., water body) being modeled, the modeler first defines the plausible range of certain key model response variables (e.g., chlorophyll a, total nitrogen) as the ‘‘behavior.’’ Outside the range is ‘‘not the behavior.’’ The modeler then samples from (often uniform) distributions of each of the model parameters and computes the values for the key response variables. Each complete sampling of all model parameters, leading to prediction, results in a ‘‘parameter set.’’ All parameter sets that result in predictions of the key model response variables in the ‘‘behavior’’ range are termed ‘‘behavior generating’’ and thus become part of the model parameter distribution. The parameter sets that do not meet this behavior criterion are termed ‘‘nonbehavior generating.’’ The cumulative distribution function (CDF) of each parameter distribution from these two classes of parameter sets (behavior generating and nonbehavior generating) can be compared for the evaluation of model parameter sensitivity. For a particular parameter, if the behavior generating and nonbehavior generating distributions are substantially different, then prediction of the key response variables is sensitive to that parameter.  Hence, resources devoted toward model improvement might be preferentially allocated toward improved estimation of that parameter. In addition, we can consider the distribution of the behavior generating parameter sets as reflecting equifinality. Thus, the empirical distribution characterizes the error (variance and covariance) structure in the model parameters, conditional on the model and on the fitting criterion (the defined plausible range of key response variables).

Generalized Likelihood Uncertainty Estimation (GLUE) is an extension of RSA; the RSA binary system of acceptance ⁄ rejection of behavioral ⁄ nonbehavioral simulations is replaced in GLUE by a ‘‘likelihood’’ measure that assigns different levels of confidence (weighting) to different parameters sets. By effectively evaluating the fit of parameter sets, RSA, GLUE, and Markov Chain Monte Carlo (MCMC) provide useful information for model parameter error propagation. These techniques can be used to develop plausible parameter sets, which collectively express the parameter covariance (parameter error and correlation) structure to help address equifinality. Each of these techniques can be used to create a multi-parameter distribution that is “behavior generating” to characterize parameter sets for a water quality model. This distribution can then become the basis for Monte Carlo simulation for error propagation; this is different from standard Monte Carlo simulation in that parameter sets, not individual parameters, are sampled. By sampling parameter sets to assess prediction uncertainty, we incorporate the parameter variance-covariance structure into the simulation results. While this still leaves model (equation) error unaddressed, it does provide the opportunity to advance our understanding of the error in water quality model predictions.