Blog

Data analytics, statistics, and more

Derivation of a Power Expression for Trend Detection Under AR(1) Noise

Detecting long-term environmental trends is complicated by natural variability and temporal autocorrelation, both of which reduce the statistical power of trend-detection methods. Building on the Weatherhead-style trend-detectability framework commonly used in environmental and climatological applications, this study develops an analytical derivation for estimating statistical power and required sample size for detecting monotonic trends under approximately AR(1) residual behavior. The resulting framework extends traditional Weatherhead-style formulations by providing explicit analytical power and sample-size expressions within a Mann-Kendall oriented trend-detection context. The resulting equations provide a practical analytical tool for evaluating trend detectability in environmental monitoring programs and long-term climatological studies.

May 12, 2026

Estimating Upper-Tail Quantiles: Why Sample Size Matters

This simulation study examines how sample size affects estimation of upper-tail quantiles commonly used in environmental threshold analyses. Using simulated lognormal data, the behavior of empirical quantiles and quantile regression estimates of the 95th percentile is compared across sample sizes. Results show that small samples produce substantial uncertainty and structured finite-sample variability, highlighting the sensitivity of upper-tail estimates to methodological details and emphasizing the need for caution when using sparse datasets to establish background thresholds or 95-95 upper tolerance limits.

May 6, 2026

Insider Censoring in Environmental Data: Why It Biases Results and How to Fix It

Measurements near analytical detection and reporting limits are common in environmental datasets, and how these values are reported can materially influence statistical results. Insider censoring occurs when nondetect observations are censored at the reporting limit while detected values remain below that threshold. This practice introduces informative censoring that distorts distributional shape, biases summary statistics, and can alter conclusions drawn from environmental analyses. This post examines insider censoring, explains why it introduces bias in environmental datasets, and discusses approaches for mitigating its effects.

March 15, 2026

Sampling Resolution, Variogram Identifiability, and Matérn Spectral Structure

This paper examines how sampling resolution constrains variogram identifiability, showing that spatial variability occurring at scales smaller than the sampling interval cannot be resolved and is instead absorbed into the nugget effect. Using a spectral representation of stationary random fields and the Matern covariance family, the analysis formally demonstrates how unresolved micro-scale variability inflates the nugget term and alters empirical variogram structure. The results emphasize that variogram interpretation and sampling design must be aligned with plausible spatial scales of variability to support defensible environmental decision-making.

February 21, 2026

Remediation TimeFrame Estimate Using Segmented Regression

Segmented regression is a powerful statistical tool for improving remediation timeframe estimates by accounting for changes in contaminant concentration trends over time. Unlike traditional single-slope regression methods, segmented regression identifies breakpoints in monitoring data and applies distinct linear trends to different phases of plume behavior, such as rapid initial decline and long-term tailing. This approach better reflects evolving site conditions, remedy performance, and attenuation dynamics, which produces more realistic and defensible projections of cleanup timelines.

February 15, 2026

Statistical Basis for Demonstrating the Absence of Soil Contamination

This post summarizes statistically defensible methods used to demonstrate, with specified confidence, the absence of soil contamination relative to regulatory action levels. It presents exceedance-based, mean-based, percentile-based, and hotspot-detection frameworks, emphasizing how sampling design, confidence, power, and variability, rather than site area alone, govern sample size and decision reliability.

February 8, 2026

Block Kriging

This post explores block kriging as a geostatistical method for estimating average values over defined areas, contrasting it with point kriging. Using daily rainfall measurements in Switzerland and the R gstat package, the analysis demonstrates how block kriging produces smoother maps and lower estimation variance compared to point kriging. While acknowledging the potential for obscuring true data variability, the post highlights block kriging’s utility when focusing on values over larger spatial supports, yielding less variable and more accurate areal mean predictions than simple averaging.

August 26, 2025

Groundwater Detection Monitoring: Importance of Limiting the Number of Constituents

Detection monitoring uses statistical analyses to differentiate natural groundwater variations from those due to landfill activities. These monitoring programs prioritize two key performance characteristics: adequate statistical power and a low sitewide false positive rate (SWFPR), distributed across all annual statistical tests. Fewer tests result in a lower single-test false negative error rate, and therefore an improvement in statistical power. To illustrate this concept, the per-test false positive rate and the corresponding power for semiannual testing at four compliance wells will be calculated, first considering 10 constituents and then 100 constituents. This post aims to correct the misconception that increasing the number of constituents enhances the statistical power of detection monitoring.

March 12, 2025

Test for Stochastic Dominance Using the Wilcoxon Rank Sum Test

The two-sample Wilcoxon Rank Sum (WRS) is often perceived as a median comparison procedure based on the assumption that two populations differ only by a consistent shift, a condition that is infrequently met in practice. Its actual purpose is to determine if one distribution stochastically dominates another. This post seeks to clarify the WRS test’s true function through a simulation involving two samples with the same medians but different distributions. In cases of non-symmetric data, alternative methods such as quantile regression and bootstrapping are recommended, offering nonparametric alternatives that do not rely on rank-based assumptions.

March 7, 2025

Statistical Properties of Autocorrelated Data

In classical statistical analysis, positive autocorrelation leads to an underestimation of the standard error because standard methods assume independence of data. This underestimation results in inflated test statistics, increasing the risk of incorrectly rejecting the null hypothesis. Autocorrelated data implies that each observation is related to nearby values, reducing the degrees of freedom and making the effective sample size smaller than the actual sample size. Monte Carlo simulation is used to explore the effect of autocorrelation on a hypothesis test to determine whether an observed data set is drawn from a population with mean zero.

November 6, 2024