Tuesday, April 30, 2013

Is Conventional Water Quality Model Verification A Charade?


In the development and application of water quality models, it is standard practice to set aside data, not used in calibration, for model verification purposes. This approach is based on the reasoning that the set-aside-data provide a test of the model under new conditions and thus reflect how the model will perform when applied for prediction. How plausible is this reasoning?

Consider the situation where a model is calibrated with data from 2010-2011, and then data from 2012 are used for verification. What is likely to be different between these calibration and verification data sets? Will these differences be sufficient to give us confidence that the calibrated model can be relied upon for predictions when important forcings/inputs (e.g., pollutant loadings to a waterbody) change?

In essentially all cases, the major differences between 2010-2011 and 2012 datasets are likely to be natural forcing functions such as hydrology, temperature, and solar radiation. It is extremely unlikely that the forcing functions that are the focus of the model application, such as LULC changes in a watershed or point source pollutant discharges, will change very much. To the extent that pollutant loads to a waterbody change over this time period, it will largely be due to changes in hydrology.

So, conventional water quality model verification has become basically a charade. This situation is not the fault of modelers; rather, it is simply the consequence of limited available data. Nonetheless, water quality modelers who employ this approach to model verification need to be more candid about the limited value of conventional model verification.

As an alternative, here is the basis for a statistical test that could provide a measure of the rigor in model verification. To begin, consider the figure below displaying histograms of dissolved oxygen data for model calibration and verification:
The next figure overlays the calibration and verification histograms for Case 1; notice how similar they are. The lack of difference between these two data sets indicates that “verification” lacks rigor; essentially, the model is being re-assessed with calibration-like data.
Now consider Case 2 below:
An overlay of the two histograms, shown below, indicates that the calibration and verification data sets are different, which suggests that verification is more rigorous than in case 1. However, note that the verification data in case 2 show DO to be lower than for model calibration. Since model applications are quite likely to address improved water quality and higher dissolved oxygen, the verification test may be rigorous but it does not reflect conditions expected for model use.



Now consider Case 3 below:



In case 3, the histogram of verification data is different from the histogram of calibration data, and this time the verification DO are higher than the calibration DO, which is a more likely prediction scenario.

In conclusion, to evaluate the rigor of the verification exercise, I recommend that modelers apply a Kolmogorov-Smirnov test, or a Chi-Square test, to quantitatively assess the difference between the calibration and verification data sets.  If this becomes routine practice, the accumulated results will provide us with a comparative basis for having confidence that a water quality model can be used to reliably predict water quality in response to management changes.



No comments:

Post a Comment