Tuesday, May 28, 2013

A Bayesian Approach for Model Verification


Last month on my blog, I posted “Is Conventional Water Quality Model Verification A Charade?” and I concluded that model verification is indeed a charade as currently practiced. I received several comments on my posting of this blog piece to LinkedIn, and a number of those who commented agreed with my conclusions. So, what is to be done?

In that blog post, I proposed a probabilistic method for assessing the rigor of a model verification assessment. While my proposal is helpful for quantifying verification rigor in an application, it neglects the “track record” of a particular model; this is unfair to the model and to model users since my verification rigor test depends on the availability of calibration-independent data. Thus, it is conceivable that a good model will receive a poor verification rigor score purely due to limited verification data.

To address this conundrum, I propose that we employ a Bayesian approach that includes an informative prior, and this prior should be based on an ex post facto analysis of model performance. The National Academy of Sciences TMDL panel that I chaired in 2001 discussed the need for this type of analysis for models used for TMDL development; to my knowledge, this was never undertaken.  So, what might be done to address this issue using Bayes Theorem?  In brief, it should not be difficult to apply Bayes Theorem for a particular water quality model to quantitatively combine:

1.        a statistical assessment of the prediction-observation difference for any model, based on data collected after implementation (and after water quality response) of a pollutant load management action.

2.       the results from the model verification rigor assessment that I proposed, for the particular model application of interest.


Let us consider making this a requirement for water quality models that are used to inform multi-million dollar decisions!

Tuesday, May 21, 2013

Should States Augment Numeric Nutrient Criteria for Rivers & Streams with Macroinvertebrate Assessments?


The 2001 National Academy of Sciences review of the TMDL program recommended that water quality criteria be positioned as closely as possible to the biological (or human health) response in the stressor-response causal chain. For nutrient criteria, this means that a measure of algal density (e.g., chlorophyll a), submerged aquatic vegetation, and/or macroinvertebrate indices might serve to augment or replace phosphorus and nitrogen criteria. A number of recent high-profile US EPA and other regulatory agency efforts to develop water quality criteria using possible stressor-response relationships observed in field data highlight the importance of this as part of the nutrient criteria development process; the recent assessment of Florida nutrient criteria provides a contentious example. As two recent EPA Science Advisory Board reviews have made clear, the presence of an underlying cause and effect in the stressor-response relationship is critical to the effectiveness of such water quality criteria. Unfortunately there is little EPA guidance on how to develop sufficient evidence to support cause-effect conclusions. This lack of guidance increases the likelihood that water quality criteria, lacking a firm basis for establishing cause and effect, will be proposed or established in regulations, resulting in ineffective and inefficient criteria.

We know that phosphorus and nitrogen are essential nutrients for plant and animal life. Likewise, we know that the range of levels of phosphorus and nitrogen found in surface water bodies do affect aquatic biota such as algae and aquatic vegetation. Indeed, there is a well-established scientific basis linking nutrient concentrations in lakes/estuaries to chlorophyll a; this relationship has been observed in both cross-sectional and time series analyses of data.  However, while phosphorus and nitrogen are obviously essential for macroinvertebrate life, it has not been demonstrated that the range of phosphorus and nitrogen levels found in rivers and streams is a strong determinant of macroinvertebrate indicators of aquatic ecosystem health in these waterbodies. Observational data analyses suggest that other determinants of benthic macroinvertebrate indices (BMIs), such as variations in streamflow and temperature, sediments, and toxic substances may dominate cause-effect relationships. Thus one or more of these stressors may be the primary cause of observed changes in a river or stream benthic macroinvertebrate index that could falsely be attributed to nutrient levels.

When a state agency sets or modifies a water quality criterion, it is reasonable to expect that the state agency believes that its action will improve the probability of correct decisions on use impairment for a waterbody. In the specific situation of BMIs augmenting nutrient criteria, data analyses and rigorous causal analyses do not currently support that belief. Methods such as counterfactual analysis, Bayesian networks, and/or weight of evidence are needed to justify the causal relationship between nutrients and benthic macroinvertebrates in streams. Otherwise, we risk costly nutrient control measures that do not yield the expected benefits. This not only is a waste of critical resources, it undermines public confidence in legitimate efforts at environmental protection.

Thursday, May 16, 2013

Multiple Models to Inform Decision Making


Environmental simulation models are invaluable tools for informing decision making. For example, we depend on hydrologic models for water supply/flooding decisions, water quality models for Total Maximum Daily Loads (TMDLs), and air quality models for standards compliance assessment. When multiple models are available, it is not uncommon to apply more than a single model to inform a decision, particularly if the consequences of a decision are significant, and uncertainties are believed to be large.

Consider weather forecasting for major storms such as hurricanes. It is now common practice for a meteorologist to display the projected deterministic trajectories of the hurricane’s eye based on several model forecasts; the fact that these models are based on different mathematical constructs adds to the robustness of the envelope covering the range of storm trajectories. It also provides greater public awareness of possible outcomes than does a single deterministic model trajectory.
In a presentation in the UMCES IAN Seminar Series for the Chesapeake Bay Program (http://ian.umces.edu/seminarseries/video/72/), I urged that the Bay Program develop a second water quality model that would provide important information on the uncertainty in the assessment of the impact of proposed management actions, information that the current Chesapeake Bay Model (CBM) cannot provide. The CBM was developed and continues to be refined by an experienced, excellent modeling team. Yet, as a single deterministic model, the CBM shares a critical shortcoming with the most sophisticated meteorological model; both are deterministic simplifications of an extremely complex system. We have seen that even the most elaborate CBM is prone to public criticism and skepticism concerning the dependency of decisions on the CBM forecasts, particularly in the absence of uncertainty analysis or an envelope covering the range of possible outcomes associated with management options.

Multiple models have been quite effective in the past for informing water quality management decisions. For example, the Great Lakes agreement in the early 1980s was aided by multiple models, and more recently, I was involved in a successful multiple models assessment for the Neuse River Estuary (NC) nitrogen TMDL. For the Neuse Estuary, three models were developed and applied: (1) a laterally-averaged version of CE-QUAL-W2, (2) a three-dimensional version of EFDC-WASP, and (3) a probabilistic Bayes network model (see Craig A. Stow, Chris Roessler, Mark E. Borsuk, James D. Bowen, and Kenneth H. Reckhow. 2003. A Comparison of Estuarine Water Quality Models for TMDL development in the Neuse River Estuary. Journal Water Resources Planning and Management. 129:307-314 for a comparison of the model applications). The first two models applied in the Neuse study provided substantial spatial/temporal/ecological detail, while the third model provided less detail but could be used to estimate uncertainties of model forecasts.  The Neuse case study demonstrated the value of multiple models in enriching the overall assessment of the impact of management actions. Even the best model usually cannot deal with all issues that are important to stakeholders and decision makers. Developing and applying a second model that helps to address different issues that are difficult for the first model to address should add little additional cost while adding substantial useful information.


Monday, May 13, 2013

Use Attainability Analysis


In a recent blog post (Blue Crabs versus Green Lawns: We may have to decide), I discussed the challenges in achieving compliance with water quality criteria in a number of U.S. water bodies. In 2001, I chaired a National Academy of Sciences panel charged with examining the scientific and data basis in support of the TMDL program. While our panel concluded that extant science and data were sufficient to support this EPA program, we issued a number of recommendations to improve the program. One of these concerned Use Attainability Analysis. The NRC report statements, presented below, are just as relevant now as they were twelve years ago.

A Use Attainability Analysis (UAA) determines if impairment is caused by natural contaminants, nonremovable physical conditions, legacy pollutants, or natural conditions. More importantly, a UAA can refine the water quality standard. UAA should result in more stratified and detailed narrative statements of the desired use and measurable criterion. For example, a UAA might refine the designated use and criterion from a statement that the water needs to be fishable to a statement calling for a reproducing trout population. Then one or more criteria for measuring attainment of this designated use are described; these might include minimum dissolved oxygen or maximum suspended sediment requirements. Alternatively, an index to measure biological condition appropriate to the trout fishery designated use, such as an index of biological integrity (IBI), may be defined.

In the 1990s, TMDLs were undertaken for some waterbodies where the designated use was not attainable for reasons that could have been disposed of by a UAA. For example, TMDLs conducted in Louisiana resulted in the conclusion that even implementing zero discharge of a pollutant would not bring attainment of water quality standards (Houck, 1999). A properly conducted UAA would have revealed the true problem— naturally low dissolved oxygen concentrations—before the time and money were spent to develop the TMDL. Unfortunately, UAA has not been widely employed. Novotny et al. (1997) found that 19 states reported no experience with UAA. The majority of states reported a few to less than 100 UAAs, while five states (Indiana, Nebraska, New York, Oklahoma, and Pennsylvania) performed more than 100.

One possible explanation for the failure to widely employ UAA analysis is the absence of useful EPA guidelines. The last technical support manuals were issued in the early 1980s (EPA, 1983) and are limited to physical, chemical and biological analyses. It is presently not clear what technical information constitutes an adequate UAA for making a change to the use designation for a waterbody that will be approved by the EPA.

In addition to being a technical challenge, standards review also has important socioeconomic consequences (see point 6 in Box 5-1 below). EPA has provided little information on how to conduct socioeconomic analyses or how to incorporate such analyses in the UAA decision. The socioeconomic analysis suggested by EPA is limited to narrowly-conceived financial affordability and economy-wide economic impact assessments (e.g., employment effects) (Novotny et al., 1997).  Finally, EPA has offered no guidance on what constitutes an acceptable UAA in waterbodies of different complexity and on what decision criteria will be accepted as a basis for changing a use designation. This is significant because EPA retains the authority to approve state water quality standards. These uncertainties discourage state use of UAA because there is no assurance that EPA will accept the result of the UAA effort as an alternative to a TMDL, especially if the EPA expectation for a UAA will result in significant analytical costs.


BOX 5-1
Six Reasons for Changing the Water Quality Standard
The following six situations, which can be revealed by UAA, constitute reasons for changing a designated use or a water quality standard (EPA, 1994). Conducting a UAA does not necessarily preclude the development of a TMDL.
1. Naturally occurring pollutant concentrations prevent attainment of the use.
2. Natural, ephemeral, intermittent, or low flow water levels prevent the attainment of the use unless these conditions may be compensated for by a sufficient volume of effluent discharge without violating state conservation requirements to enable uses to be met.
3. Human-caused conditions or sources of pollution prevent the attainment of the use and cannot be remedied or would cause more environmental damage to correct than to leave in place (e.g., as with some legacy pollutants).
4. Dams, diversions, or other types of hydrologic modifications preclude the attainment of the use, and it is not feasible to restore the waterbody to its original condition or to operate such modification in a way that would result in the attainment of the use.
5. Physical conditions related to the natural features of the waterbody, such as the lack of proper substrate, cover, flow, depth, pools, riffles, and the like, unrelated to water quality, preclude attainment of aquatic life protection uses.
6. Controls more stringent that those required by the CWA mandatory controls (Sections 301b and 306) would result in substantial and widespread adverse social and economic impact. This requires developing a TMDL and conducting a socioeconomic impact analysis of the resulting TMDL (Novotny et al., 1997).

Conclusions and Recommendations
1. EPA should issue new guidance on UAA. This should incorporate the following: (1) levels of detail required for UAAs for waterbodies of different size and complexity, (2) broadened socioeconomic evaluation and decision analysis guidelines for states to use during UAA, and (3) the relative responsibilities and authorities of the states and EPA in making use designations for specific waterbodies following a UAA analysis.
2. UAA should be considered for all waterbodies before a TMDL plan is developed. The UAA will assure that before extensive planning and implementation actions are taken, there is clarity about the uses to be secured and the associated criteria to measure use attainment. UAA is especially warranted if the water quality standards used for the assessment were not well stratified. However, the decision to do a UAA for any waterbody should rest with each state.

Note:
UAAs are still quite rare; however in many respects, a UAA is simply a TMDL analysis expanded with socioeconomic analyses.

Wednesday, May 8, 2013

Blue Crabs versus Green Lawns: We may have to decide


What happens if we are unable to achieve federally mandated water quality standards in our lakes, rivers, and bays?
In 1972, Congress enacted the Clean Water Act governing water pollution in the U.S. Among other things, the Clean Water Act regulates the release of pollutants into surface waters from point sources of pollution. Individual states determine water quality standards for bodies of water within their borders.
Are some of these water quality standards essentially unattainable now without major lifestyle changes in the watersheds of certain major bodies of water, such as Chesapeake Bay? In the past 30-40 years, the U.S. has been extraordinarily effective in reducing pollutant discharges from “point sources” such as public and private wastewater treatment plants. Many of these treatment plants are operating at or near the current limits of technology with extremely high removal efficiencies, such that further improvements could be quite costly with perhaps little additional pollutant removal.
Despite those significant reductions in pollutant discharges from point sources, to achieve compliance with water quality standards, we still have a long way to go in reducing nitrogen and/or phosphorus loading in the Chesapeake Bay and other water bodies, such as Falls Reservoir in my own backyard. That is because these nutrients also enter lakes, rivers, and bays from “nonpoint sources” in the watershed, such as agricultural fields, feedlots, stormwater drainage from urban areas, and lawn fertilizers. In the case of Falls Reservoir in North Carolina, phosphorus in soils that were flooded to create the reservoir thirty years ago is likely still being released into the water, slowing recovery of the reservoir from nutrient enrichment.
The effects of nutrients in waterbodies range from annoying to dangerous. High levels of nitrogen and phosphorus cause excessive growth of algae, which when the algae decompose, can deplete oxygen needed by fish and shellfish. Fish kills can result. Some algal blooms are toxic, causing potential threats to aquatic life and possibly to humans. Also, affected water can become discolored or cloudy and take on odors, impacting recreational activities such as swimming and boating.
Measures to control nonpoint pollution are expensive and may be imposed on communities that do not easily benefit from use of the body of water being protected. For instance, New York State and West Virginia are required as part of the Chesapeake Bay cleanup to implement a plan to reduce their nutrient loading to Chesapeake Bay (CB), since some of their land is in the CB watershed and thus contributes to the problem. Perhaps it is not surprising, but understandingly distressing to many in the environmental community, that legal challenges have already arisen even in the early stages of the Chesapeake Bay cleanup.
It may be that our urban/suburban lifestyle expectations are not compatible with the water quality needed to support desirable uses like commercial and recreational fishing.  Perhaps current water quality standards will ultimately require such drastic measures as banning residential lawns, restricting agriculture, and/or limiting development and urban growth within a watershed. Given the unlikelihood that we will halt urban development or curtail agricultural activities in watersheds, it is possible that we cannot achieve current water quality standards in some of our major U.S. waterbodies.
We must also consider what we gain by even partial compliance with mandated pollutant controls. Does an expenditure of 70% of the total estimated costs for point and nonpoint pollution control equal a 70% gain in water quality benefits? Not necessarily.
So, do we want blue crabs and oysters badly enough in Chesapeake Bay even if it requires that we curtail urban development and forego our manicured lawns? While this is a contentious issue, it seems prudent to engage in a discussion of these cost/benefit trade-offs (along with recognition of the distribution of costs and benefits) before proceeding too far with major public investments that yield little beneficial return.

Monday, May 6, 2013

The Uses and Limitations of Trend Analyses in Water Quality Studies


Public concerns about water quality often focus on change. In other words, is water quality degradation associated with growth and development occurring? Or, is water quality improving after the implementation of new management actions? Answers to these questions may help guide future decision making, yet clear unambiguous answers can be difficult to obtain.

First, it is important to recognize that even if the management action of interest is affecting water quality, trends may not be apparent in water quality data for several years. This is particularly true for agricultural BMPs, where the small individual impact of a single BMP and the lag time in response to BMP implementation, are important determinants of observable response. Many factors, such as other sources of contaminants, seasonal cycles, precipitation, and natural variability affect measured water quality. As a consequence, it often takes many years of regular water quality data collection to statistically detect a trend. In general, large, abrupt changes in water quality will be detected with fewer samples than will small, gradual changes.

Trend detection involves finding a signal (the trend) in the midst of background variability (noise); the larger the noise, or the smaller the trend, the more data are needed to confidently assess the presence of a trend. More frequent sampling generally helps up to a point; however, samples collected much more frequent than monthly may be serially correlated. As a result of this correlation, sequential samples in time may not be entirely independent of each other, which means that more samples will be needed for statistical trend detection than under conditions of independence. Note that proper data analysis to characterize other patterns (e.g., seasonality) in the data can improve the sensitivity of the test.

Once the water quality data have been collected, we could simply look at a graph of the water quality data versus time, and determine the presence of a trend by visual inspection. However, a more scientifically-defensible approach is to use a statistical technique like the seasonal Kendall test to evaluate data for the presence of a trend, as statistical tests add analytic rigor and a level of objectivity to the conclusion.

The seasonal Kendall (SK) test has become the “industry standard” in water quality trend detection. SK programs are widely available, and the test statistic is relatively easy to compute and interpret. The SK test simply indicates the likely presence (or absence) of a trend at a specified level of significance; other statistics can then be computed to estimate the magnitude of any trend present. Among the shortcomings of the SK test are its restriction to monotonic (unidirectional) trends, and the limited insight it provides in comparison to other methods that might be preferred by an experienced statistician.

Once the application of the SK test indicates the likely presence of a water quality trend, several issues must be addressed to make the analysis useful for management. First, unlike a predictive water quality model, the trend test results provide no information about the likely causes and corrective measures for the trend. Fortunately, a good sampling design may help isolate the cause(s) of a trend. For example, if the impact on river water quality associated with nitrogen removal from a major wastewater treatment plant is of interest, then a reasonable design option is to take samples for nitrogen concentration in the river just below the discharge.

While that particular sampling program may isolate the source, it may be less informative about the meaningful water quality impacts. For example, the treatment plant of concern may be located upstream in the Susquehanna River, while the water quality impact of interest may be downstream in Chesapeake Bay. Processes can occur in the river such that the trend in nitrogen concentration in the Susquehanna due to the treatment plant operations is quite different from that in the Chesapeake. The Bay trend is also likely to be less detectable, since many other factors affect nitrogen concentration in the Chesapeake.

To further complicate matters in this nutrient enrichment example, public response to water quality management actions is probably influenced largely by algal blooms, fishkills, and shellfish harvest in the Chesapeake, not by nitrogen concentration. Unfortunately, measuring trends in algal blooms, fishkills, and shellfish harvest in the Chesapeake Bay and then linking those trends to improvements in nitrogen removal at a specific upstream wastewater treatment plant may be technically and economically infeasible. Thus from a practical perspective, sampling may still focus on the nitrogen trend in the River even though the interesting trend concerns blue crab harvests in the Bay.

In that situation, scientists should describe for policy makers the implications and limitations of assessing trends in a surrogate water quality variable. In the causal chain from a nitrogen source, to nitrogen input, to riverine nitrogen concentration, to estuarine nitrogen concentration, to dissolved oxygen, to blue crabs, trends assessed closer to the source can more easily be related to the underlying cause, but they have less meaning to the public concerning important water quality impacts.

In summary, water quality trend assessment serves primarily as a warning system for change. This can be extremely useful for policy evaluation, but it must be emphasized that definitive conclusions on water quality trends may require years of sampling. Ultimately, if a trend is identified, additional scientific understanding is often essential to understand the implications of the trends and to identify effective corrective actions if the trend reflects water quality degradation.

Thursday, May 2, 2013

Operational Water Quality Standards and Numeric Nutrient Criteria


Effective water quality management is built on a foundation of water quality standards. Recognizing this, most states have focused on making standards defensible from a scientific and socioeconomic perspective. However, standards must ultimately be protective, and for that we must consider the operational enforcement of the standard.

Standards become scientifically and socioeconomically defensible through careful determination of the designated use, an appropriate criterion, and an antidegradation policy. This basically means that the designated use should properly reflect regulatory requirements, societal preferences, and scientific assessments, while the criterion should reflect the science relating water quality indicators to use designation.

Standards become operationally enforceable when they are stated in a manner that makes compliance assessment clear and unambiguous. Most surface water quality standards are expressed and evaluated based on a single, point-valued chemical criterion (e.g., 50 ug/l arsenic for Class C Waters in North Carolina). This criterion is then used for two primary compliance assessments: (1) current water quality – based on a comparison of the criterion with measurements to determine if a waterbody is currently in compliance, and (2) future water quality – based on model forecasts to determine if proposed management actions will achieve compliance.

Consider the following examples of the two types of compliance assessments:
1.   The turbidity criterion for Class C Waters in North Carolina is 50 NTU (Nephelometric Turbidity Units). Given natural variability in precipitation and water runoff, changes in human activities in developed watersheds, and measurement error, a set of turbidity measurements over time at a single sampling station is going to vary.
2.   The chlorophyll a criterion is 40 ug/l for Class C Waters in North Carolina. Given the uncertainty in predictive model forecasts, it is highly likely that the upper tail of the probability distribution characterizing chlorophyll a model forecast error will exceed 40 ug/l for any feasible management strategy for most waterbodies that are currently out of compliance.
Based on the wording in the North Carolina water quality standards, compliance assessment will reflect a comparison of a precise fixed criterion with a distribution of measurements or forecasts. From a practical standpoint, how does this comparison proceed? In other words, is compliance with the criterion to be achieved only if there are no observations/predictions that exceed the numeric criterion (e.g., zero violations)? That strategy may be feasible when comparing a set of current water quality measurements with a fixed criterion. However, that strategy is generally not practical with water quality model forecasts which will likely yield a nonzero probability of exceeding a water quality criterion in most applications.

For impaired waters (303(d)) listing based on measurements of current water quality, the EPA and state agencies have tended to allow 10% exceedances of the numeric criterion, probably in acknowledgment of natural variability and measurement error. However, for TMDL forecasting, which requires compliance assessment with a water quality model, the EPA and state agencies tend to ignore model forecast uncertainty, despite the fact that this uncertainty may be quite large. Thus EPA and state agencies lack practical experience for selection of the allowable percent exceedances of a criterion associated with future pollutant loading for a TMDL, to account for model prediction uncertainty.

Allowing a selected percentage of exceedances of a numeric criterion does make sense. In principle, unless there is to be an infinite penalty associated with exceedance of a criterion, an analysis of benefits and costs would lead to probabilistically-based standards that included a nonzero chance of exceedance of the criterion. In practice, determining cost/benefit-based standards is a difficult task; hence, the arbitrary choice of 10% exceedances appeared to be a pragmatic action by EPA.

Still, we should be able to do better. First, research could help guide the choice of allowable percent exceedances so that it bears some relation to the consequences of compliance and noncompliance. Second, research is needed on estimation of model forecast errors so that application of the standard in forecast scenarios incorporates a reasonable choice for percent exceedances. Finally, the language in the water quality standards needs to be expressed so that the standards are operationally enforceable.

An additional area of concern for operationally enforceable water quality standards relates to the recent push by EPA for numeric nutrient criteria, in part to remove the ambiguity of narrative criteria. However, numeric water quality criteria can also be ambiguous. Consider the North Carolina numeric dissolved oxygen criterion: “not less than an average of 5.0 mg/l with a minimum instantaneous value of not less than 4.0 mg/l.” We know that DO varies naturally with temperature in both time and space. So a dissolved oxygen criterion can be ambiguous and nonprotective unless it is operationally assessed based on the: (1) space/time variability in dissolved oxygen in a waterbody, and (2) the “region” of space/time that the DO standard is intended to protect. Otherwise, water quality monitoring to assess compliance with this criterion can result in compliance or noncompliance due solely to a sampling design that ignores natural variability.

The importance of the TMDL program and the 303(d) listing process has increased the need for operational water quality standards. By explicitly acknowledging variability and uncertainty through standards that allow for percent exceedances, the standards become less ambiguous and more enforceable.