Ken Reckhow's Water Quality Wire: Bayesian Hypothesis Testing in Non-Replicated Studies

Many studies in ecology, both experimental and observational, are designed to assess what may be referred to as a “treatment effect.” The treatment effect can pertain to such things as: the influence of various factors on growth rate of an organism, the effect of a pollution control strategy on ambient pollutant concentrations, or the effect of a newly-created herbicide on animal life. It is common practice in these situations for the scientist to obtain data on the treatment effect and use hypothesis testing to assess the statistical significance of the effect.

In classical or frequentist statistical analysis, hypothesis testing for a treatment effect is often based on a point null hypothesis (which actually should be used only if it is considered appropriate from a scientific standpoint). Typically, the point null hypothesis is that there is no effect; it is often stated in this way as a “straw man” that the scientist expects to reject on the basis of the data evidence. To test the null hypothesis, data are obtained to provide a sample estimate of the effect of interest and then to compute an estimate of the test statistic. Following that, a table for the test statistic is consulted to assess how unusual the observed value of the test statistic is, given (assuming) that the null hypothesis is true. If the observed value of the test statistic is unusual, that is, if it essentially incompatible with the null hypothesis, then the null hypothesis is rejected.

In classical statistics, this assessment of the test statistic is based on the sampling distribution for the test statistic. The sampling distribution is a probability density function that is hypothetical is nature. In effect, it is a smoothed histogram for the test statistic plotted for a large number of hypothetical samples with the same sample size. Inference in classical statistics is based on the distribution of estimators and test statistics in many (hypothetical) samples, despite the fact that virtually all statistical investigations involve a single sample. This hypothetical sampling distribution provides a measure of the frequency, or probability, that a particular value, or range of values, for the test statistic will be determined for a set of many samples. In classical statistics, we equate this long-run frequency to the probability for a particular sample, before that sample is taken.

There are two problems with this approach that are addressed through use of Bayesian statistical methods. The first is that the hypothesis test is based on a test statistic that is at best indirectly related to the quantity of interest - the truth (or probability of truth) of the null hypothesis. The p-value commonly reported in hypothesis testing is the probability (frequency), given that the null hypothesis is true, of observing values for the test statistic that are as extreme, or more extreme, than the value actually observed; in other words:

p(test statistic equals or exceeds k|H₀ is true)

The scientist, however, is interested in the probability of the correctness of the hypothesis, given that he has observed a particular value for the test statistic; in other words:

p(H₀ is true|test statistic=k)

Classical statistical inference does not provide a direct answer to the scientist's question; Bayesian inference does.

The second problem relates to the issue of “conditioning,” which concerns the nature of the sample information in support of the hypothesis. Bayesian hypothesis tests are conditioned only on the sample taken, whereas classical hypothesis tests are conditioned on other hypothetical samples in the sampling distribution (more extreme than that observed) that could have been selected, but were not. The Bayesian approach, of course, uses more than the sample as it also incorporates prior information. However, the prior, while judgmental, does relate to the hypothesis of interest, whereas the sampling distribution relates to logically irrelevant, hypothetical samples. Clearly, the Bayesian approach is more focused on the problem of interest to the ecologist.

Reckhow (1990) illustrates the tendency of p-values to overstate the sample evidence against the null hypothesis in an example concerning acidification of lakes.

Reckhow, K.H. 1990. Bayesian Inference in Non-Replicated Ecological Studies. Ecology. 71:2053-2059.

https://www.researchgate.net/publication/249011176_Bayesian_Inference_in_Non-Replicated_Ecological_Studies?ev=prf_pub

Ken Reckhow's Water Quality Wire

Sunday, July 14, 2013

Bayesian Hypothesis Testing in Non-Replicated Studies

No comments:

Post a Comment