<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Blog on Charles Holbert</title>
    <link>https://www.cfholbert.com/blog/</link>
    <description>Recent content in Blog on Charles Holbert</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sat, 21 Feb 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://www.cfholbert.com/blog/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Sampling Resolution, Variogram Identifiability, and Matérn Spectral Structure</title>
      <link>https://www.cfholbert.com/blog/variogram-microscale-variability/</link>
      <pubDate>Sat, 21 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/variogram-microscale-variability/</guid>
      <description>This paper examines how sampling resolution constrains variogram identifiability, showing that spatial variability occurring at scales smaller than the sampling interval cannot be resolved and is instead absorbed into the nugget effect. Using a spectral representation of stationary random fields and the Matern covariance family, the analysis formally demonstrates how unresolved micro-scale variability inflates the nugget term and alters empirical variogram structure. The results emphasize that variogram interpretation and sampling design must be aligned with plausible spatial scales of variability to support defensible environmental decision-making.</description>
    </item>
    <item>
      <title>Remediation TimeFrame Estimate Using Segmented Regression</title>
      <link>https://www.cfholbert.com/blog/segmented-regression/</link>
      <pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/segmented-regression/</guid>
      <description>Segmented regression is a powerful statistical tool for improving remediation timeframe estimates by accounting for changes in contaminant concentration trends over time. Unlike traditional single-slope regression methods, segmented regression identifies breakpoints in monitoring data and applies distinct linear trends to different phases of plume behavior, such as rapid initial decline and long-term tailing. This approach better reflects evolving site conditions, remedy performance, and attenuation dynamics, which produces more realistic and defensible projections of cleanup timelines.</description>
    </item>
    <item>
      <title>Statistical Basis for Demonstrating the Absence of Soil Contamination</title>
      <link>https://www.cfholbert.com/blog/soil-sample-size/</link>
      <pubDate>Sun, 08 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/soil-sample-size/</guid>
      <description>This post summarizes statistically defensible methods used to demonstrate, with specified confidence, the absence of soil contamination relative to regulatory action levels. It presents exceedance-based, mean-based, percentile-based, and hotspot-detection frameworks, emphasizing how sampling design, confidence, power, and variability, rather than site area alone, govern sample size and decision reliability.</description>
    </item>
    <item>
      <title>Block Kriging</title>
      <link>https://www.cfholbert.com/blog/block-kriging/</link>
      <pubDate>Tue, 26 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/block-kriging/</guid>
      <description>This post explores block kriging as a geostatistical method for estimating average values over defined areas, contrasting it with point kriging. Using daily rainfall measurements in Switzerland and the R gstat package, the analysis demonstrates how block kriging produces smoother maps and lower estimation variance compared to point kriging. While acknowledging the potential for obscuring true data variability, the post highlights block kriging&amp;rsquo;s utility when focusing on values over larger spatial supports, yielding less variable and more accurate areal mean predictions than simple averaging.</description>
    </item>
    <item>
      <title>Groundwater Detection Monitoring: Importance of Limiting the Number of Constituents</title>
      <link>https://www.cfholbert.com/blog/detection-monitoring/</link>
      <pubDate>Wed, 12 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/detection-monitoring/</guid>
      <description>Detection monitoring uses statistical analyses to differentiate natural groundwater variations from those due to landfill activities. These monitoring programs prioritize two key performance characteristics: adequate statistical power and a low sitewide false positive rate (SWFPR), distributed across all annual statistical tests. Fewer tests result in a lower single-test false negative error rate, and therefore an improvement in statistical power. To illustrate this concept, the per-test false positive rate and the corresponding power for semiannual testing at four compliance wells will be calculated, first considering 10 constituents and then 100 constituents. This post aims to correct the misconception that increasing the number of constituents enhances the statistical power of detection monitoring.</description>
    </item>
    <item>
      <title>Test for Stochastic Dominance Using the Wilcoxon Rank Sum Test</title>
      <link>https://www.cfholbert.com/blog/wrs-test/</link>
      <pubDate>Fri, 07 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/wrs-test/</guid>
      <description>The two-sample Wilcoxon Rank Sum (WRS) is often perceived as a median comparison procedure based on the assumption that two populations differ only by a consistent shift, a condition that is infrequently met in practice. Its actual purpose is to determine if one distribution stochastically dominates another. This post seeks to clarify the WRS test&amp;rsquo;s true function through a simulation involving two samples with the same medians but different distributions. In cases of non-symmetric data, alternative methods such as quantile regression and bootstrapping are recommended, offering nonparametric alternatives that do not rely on rank-based assumptions.</description>
    </item>
    <item>
      <title>Statistical Properties of Autocorrelated Data</title>
      <link>https://www.cfholbert.com/blog/autocorrelation/</link>
      <pubDate>Wed, 06 Nov 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/autocorrelation/</guid>
      <description>In classical statistical analysis, positive autocorrelation leads to an underestimation of the standard error because standard methods assume independence of data. This underestimation results in inflated test statistics, increasing the risk of incorrectly rejecting the null hypothesis. Autocorrelated data implies that each observation is related to nearby values, reducing the degrees of freedom and making the effective sample size smaller than the actual sample size. Monte Carlo simulation is used to explore the effect of autocorrelation on a hypothesis test to determine whether an observed data set is drawn from a population with mean zero.</description>
    </item>
    <item>
      <title>Lognormal Kriging and Bias-Corrected Back-Transformation</title>
      <link>https://www.cfholbert.com/blog/lognormal-kriging/</link>
      <pubDate>Thu, 15 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/lognormal-kriging/</guid>
      <description>Kriging assumes spatial stationarity and does not require a specific distribution for estimated variables. However, non-symmetric distributions, often found in earth sciences, can complicate variogram calculations and lead to over-prediction, especially when high values are present. To address these challenges, data are often transformed using the natural logarithm. A challenge occurs during back-transformation of predictions and variances from the log scale to the original scale, as simple exponentiation is insufficient due to the weighted sums in log-transformed data. This post will explore the mathematical formulations essential for effective back-transformation in lognormal kriging.</description>
    </item>
    <item>
      <title>Predictive Modelling of Traffic Accidents in the U.S.</title>
      <link>https://www.cfholbert.com/blog/traffic-accidents/</link>
      <pubDate>Fri, 09 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/traffic-accidents/</guid>
      <description>Motor vehicle accidents are an important part of traffic safety research. Analyzing the factors contributing to accidents and accident severity is critical for enhancing road safety standards. In this post, traffic accident data patterns will be explored and studied using machine-learning analysis techniques.</description>
    </item>
    <item>
      <title>Generalized Least Squares Regression</title>
      <link>https://www.cfholbert.com/blog/gls-regression/</link>
      <pubDate>Wed, 17 Apr 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/gls-regression/</guid>
      <description>In OLS regression, assumptions such as independent and identically distributed errors are important for accurate estimation and inference. Heteroskedasticity, or unequal variances of residuals, can lead to biased estimates and incorrect standard errors. Alternatives to OLS, such as GLS and WLS regression, can be considered when OLS assumptions are violated. GLS is used for dependent errors, while WLS is used for independent but non-identically distributed errors.</description>
    </item>
    <item>
      <title>Weighted Least Squares Regression</title>
      <link>https://www.cfholbert.com/blog/wls-regression/</link>
      <pubDate>Tue, 19 Mar 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/wls-regression/</guid>
      <description>Heteroscedasticity in regression analysis refers to varying levels of scatter in the residuals. Its presence affects OLS estimators and standard errors, leading to biased estimates and misleading results. When errors are independent, but not identically distributed, weighted least squares regression can be used to address heteroscedasticity by placing more weight on observations with smaller error variance. This results in smaller standard errors and more precise estimators.</description>
    </item>
    <item>
      <title>Trend Detection Using Survival Analysis</title>
      <link>https://www.cfholbert.com/blog/trend-detection-survival-analysis/</link>
      <pubDate>Thu, 07 Mar 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/trend-detection-survival-analysis/</guid>
      <description>Non-detects in environmental data can complicate analysis if not handled properly, leading to incorrect conclusions. The mathematical structure of survival analysis is general enough that it can be used in diverse fields examining various types of data not typically associated with survival/death events or failure analysis. In this post, survival analysis methods will be applied to fit a censored linear regression model to weekly ammonium deposition data to assess temporal trends.</description>
    </item>
    <item>
      <title>Arctic Sea Ice Time Series Analysis</title>
      <link>https://www.cfholbert.com/blog/arctic-sea-ice_ts/</link>
      <pubDate>Wed, 28 Feb 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/arctic-sea-ice_ts/</guid>
      <description>The modeltime R library offers a wide range of features for model evaluation, selection, and forecasting using the tidymodels ecosystem. Time-series analysis of sea ice in the Arctic polar regions performed using the modeltime library suggests that the Arctic sea will be nearly ice-free in the very near future.</description>
    </item>
    <item>
      <title>Temporal Behavior of Arctic Sea Ice</title>
      <link>https://www.cfholbert.com/blog/arctic-sea-ice/</link>
      <pubDate>Thu, 22 Feb 2024 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/arctic-sea-ice/</guid>
      <description>Passive-microwave instrumentation on satellites has allowed for the monitoring of Arctic sea ice coverage since the late 1970s, showing a long-term downward trend due to both natural variability and climate change. The rate of decline in Arctic sea ice has varied over the past 43 years, with some periods showing faster rates of loss than others. This blog post explores the temporal changes in Arctic sea ice extent using the R language for statistical computing and visualization.</description>
    </item>
    <item>
      <title>Comparison of Random and Geographically Stratified Sampling</title>
      <link>https://www.cfholbert.com/blog/random-vs-stratified-sampling/</link>
      <pubDate>Wed, 20 Dec 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/random-vs-stratified-sampling/</guid>
      <description>The post compares differences between simple random sampling and geographically stratified sampling. Stratified random sampling improves the spatial distribution of sample point locations by stratifying the field geographically and sampling randomly within each stratum. Different sampling patterns are compared using Monte Carlo simulations on a simulated population.</description>
    </item>
    <item>
      <title>Analyses of Pesticide Concentrations in California Surface Waters</title>
      <link>https://www.cfholbert.com/blog/pesticide-analysis/</link>
      <pubDate>Sat, 16 Sep 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/pesticide-analysis/</guid>
      <description>Environmental data are frequently left-censored, indicating that some values are less than the limit of quantification for the analytical methods employed. These data are problematic because censored (non-detect) values are known only to range between zero and the censoring limit. This complicates analysis of the data, including estimating statistical parameters, characterizing data distributions, and conducting inferential statistics. This post demonstrates various procedures and methods that are available in R for analyzing data containing a mixture of detects and non-detects. These methods make few or no assumptions about the data, or substitute arbitrary values (e.g., one-half the detection or reporting limit) for the non-detects.</description>
    </item>
    <item>
      <title>Clustering on Principal Component Analysis</title>
      <link>https://www.cfholbert.com/blog/cluster-pca/</link>
      <pubDate>Sun, 20 Aug 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/cluster-pca/</guid>
      <description>Combining principal component analysis (PCA) and clustering methods are useful for reducing the dimension of a data set into a few continuous variables containing the most important information in the data. This post illustrates how to combine PCA and clustering methods to identify patterns in a data set using the R language for statistical computing and visualization.</description>
    </item>
    <item>
      <title>Exploratory Spatial Data Analysis and Kriging in R</title>
      <link>https://www.cfholbert.com/blog/esda-kriging/</link>
      <pubDate>Mon, 29 May 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/esda-kriging/</guid>
      <description>This post presents and demonstrates several methods for exploratory spatial data analysis using the R language for statistical computing and visualization. These methods can be used for identifying spatial dependence patterns and spatial heterogeneity, which are critical components of variogram development and the kriging procedure.</description>
    </item>
    <item>
      <title>Natural Neighbor Interpolation With R</title>
      <link>https://www.cfholbert.com/blog/natural-neighbor-interpolation/</link>
      <pubDate>Mon, 22 May 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/natural-neighbor-interpolation/</guid>
      <description>This post presents and demonstrates several methods for natural neighbor interpolation using the R language for statistical computing and visualization. The results are compared to those obtain using ordinary kriging.</description>
    </item>
    <item>
      <title>Calculation of 95% Upper Confidence Limit for Left-Censored Data</title>
      <link>https://www.cfholbert.com/blog/ucl95-calculation-censored-data/</link>
      <pubDate>Wed, 10 May 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/ucl95-calculation-censored-data/</guid>
      <description>This post presents methods that can be used to calculate a 95% upper confidence limit on the mean of an unknown population for left-censored data sets (i.e., containing a mixture of detects and non-detects). The preferred approach depends on many factors, including the number of samples and the distributional shape of the data.</description>
    </item>
    <item>
      <title>Sample Size Determination for Correlation Studies</title>
      <link>https://www.cfholbert.com/blog/sample-size-correlation/</link>
      <pubDate>Sat, 25 Mar 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/sample-size-correlation/</guid>
      <description>Determination of an appropriate sample size when performing a correlation sudy is usually based on achieving sufficient power that the test can reject the null hypothesis that the correlation is zero. Sample sizes found using this method can yield confidence intervals that are so wide that they provide very little useful information about the magnitude of the correlation. An alternative approach is to choose a sample size that achieves a sufficiently narrow confidence interval for measuring the smallest correlation of potential interest.</description>
    </item>
    <item>
      <title>Statistical Power of Two-Sample Central Tendency Tests with Unequal Sample Size</title>
      <link>https://www.cfholbert.com/blog/central-tendency-test-power/</link>
      <pubDate>Tue, 03 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/central-tendency-test-power/</guid>
      <description>Two-sample hypothesis tests are used to compare the means, medians, or other percentiles of two populations to determine if there is a significant difference between the groups. For a given total sample size, statistical power is maximized if the sample sizes for each group are equal. With highly unequal group sizes, each additional observation adds little additional resolution. This simulation study focuses on determining the effect of unequal sample sizes on the statistical power of two-sample hypothesis tests, assuming independent samples with equal variance.</description>
    </item>
    <item>
      <title>Two-Sample Permutation Test of Difference in Means</title>
      <link>https://www.cfholbert.com/blog/two_sample_permutation_test/</link>
      <pubDate>Thu, 22 Dec 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/two_sample_permutation_test/</guid>
      <description>Permutation tests are designed to be robust against departures from normality. Permutation tests compute p-values by randomly selecting several thousand outcomes from the many larger number of outcomes possible that represent the null hypothesis. This post demonstrates how to perform a two-sample permutation test using various R packages.</description>
    </item>
    <item>
      <title>Calculation of 95% Upper Confidence Limit for Data With No Censored Values</title>
      <link>https://www.cfholbert.com/blog/ucl95-calculation-no-censoring/</link>
      <pubDate>Tue, 01 Nov 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/ucl95-calculation-no-censoring/</guid>
      <description>This post presents methods that can be used to calculate a 95% upper confidence limit on the mean of an unknown population, where all measurements are detections. The estimation methods described in this post are applicable to a random sample coming from a single statistical population.</description>
    </item>
    <item>
      <title>Species Distribution Modelling of Bigfoot Encounters Across North America</title>
      <link>https://www.cfholbert.com/blog/bigfoot-sdm/</link>
      <pubDate>Wed, 12 Oct 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/bigfoot-sdm/</guid>
      <description>This post provides a brief introduction to species distribution modelling and illustrates how machine learning can be used to predict the range of a species based on a set of locations where it has been observed.</description>
    </item>
    <item>
      <title>Trend Analysis for Censored Environmental Data</title>
      <link>https://www.cfholbert.com/blog/trend-analysis-censored-regression/</link>
      <pubDate>Fri, 07 Oct 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/trend-analysis-censored-regression/</guid>
      <description>This post examines several methods for conducting temporal trend analysis using censored data that do not substitute artificial values for non-detects. Parametric methods are based on censored regression using maximum likelihood estimation. Nonparametric methods are based on Kendall&amp;rsquo;s tau and the Akritas-Theil-Sen line.</description>
    </item>
    <item>
      <title>2-D Density Map of Bigfoot Sightings</title>
      <link>https://www.cfholbert.com/blog/bigfoot-sightings/</link>
      <pubDate>Sat, 01 Oct 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/bigfoot-sightings/</guid>
      <description>Data visualization is an important element of the data science process and the broader data presentation architecture discipline. This post will focus on performing some basic spatial data analysis using Bigfoot sightings in North America and the R language for statistics and visualization.</description>
    </item>
    <item>
      <title>Shaded Relief Basemap Using rayshader</title>
      <link>https://www.cfholbert.com/blog/shaded-relief-map-rayshader/</link>
      <pubDate>Tue, 06 Sep 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/shaded-relief-map-rayshader/</guid>
      <description>This post illustrates the use of rayshader, an R library that uses elevation data in a base R matrix and a combination of raytracing, hillshading algorithms, and overlays to generate 2D and 3D maps. A surface relief map created using digital elevation data will be rendered using rayshader and ggplot2.</description>
    </item>
    <item>
      <title>Rain Tomorrow Stacked Ensemble Model</title>
      <link>https://www.cfholbert.com/blog/rain-tomorrow-ensemble/</link>
      <pubDate>Mon, 05 Sep 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/rain-tomorrow-ensemble/</guid>
      <description>For this post, we will evaluate rainfall in Australia using daily weather observations from multiple Australian weather stations. We will build a stacked ensemble classification model using the H2O machine learning platform for use in predicting if there will be rain tomorrow.</description>
    </item>
    <item>
      <title>Rain Tomorrow</title>
      <link>https://www.cfholbert.com/blog/rain-tomorrow/</link>
      <pubDate>Sun, 28 Aug 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/rain-tomorrow/</guid>
      <description>For this post, we will evaluate rainfall in Australia using daily weather observations from multiple Australian weather stations. We will build several machine learning models using the tidymodels framework for use in predicting if there will be rain tomorrow.</description>
    </item>
    <item>
      <title>Shaded Relief Basemap Using ggplot2</title>
      <link>https://www.cfholbert.com/blog/shaded-relief-map/</link>
      <pubDate>Mon, 25 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/shaded-relief-map/</guid>
      <description>Shaded relief of surface elevation illustrates the shape of the terrain in a realistic fashion by showing how the three-dimensional surface would be illuminated from a point light source. This post illustrates the use of geom_relief(), a new aesthetic mapping layer, or geom (geometric object), for creating a shaded relief basemap using ggplot2.</description>
    </item>
    <item>
      <title>Examination of England&#39;s Surface Water Quality</title>
      <link>https://www.cfholbert.com/blog/waterbody-classification/</link>
      <pubDate>Mon, 18 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/waterbody-classification/</guid>
      <description>This post is intended to provide tools and insights to those individuals interested in analyzing the ecological and chemical status of various water bodies across England. Information and data about the river basin management water environment can be accessed from the Catchment Data Explorer. Classifications indicate where the quality of the environment is good, where it may need improvement, and what may need to be improved.</description>
    </item>
    <item>
      <title>County Drought Levels Throughout the United States</title>
      <link>https://www.cfholbert.com/blog/drought/</link>
      <pubDate>Sun, 03 Jul 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/drought/</guid>
      <description>The U.S. Drought Monitor is updated each Thursday to show the location and intensity of drought across the country, which uses a five-category system, from Abnormally Dry (D0) conditions to Exceptional Drought (D4). Using these data and the R statistical programming language, we can visualize drought severity across the United States for various time periods as static maps or even as an animated map</description>
    </item>
    <item>
      <title>PCA, t-SNE, and UMAP Classification of Vegetable Oils</title>
      <link>https://www.cfholbert.com/blog/pca-tsne-umap/</link>
      <pubDate>Sun, 05 Jun 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/pca-tsne-umap/</guid>
      <description>In this post, we explore three dimensionality reduction techniques specifically used for data exploration and visualization: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP).</description>
    </item>
    <item>
      <title>Proportional Odds Ordinal Logistic Regression</title>
      <link>https://www.cfholbert.com/blog/ordinal_logistic_regression/</link>
      <pubDate>Sun, 17 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/ordinal_logistic_regression/</guid>
      <description>In this post, we will use ordinal logistic regression to provide general contrasts on the log odds ratio scale as an alternative to nonparametric ANOVA. Proportional odds ordinal logistic regression is a generalization of the Wilcoxon and Kruskal-Wallis tests that extends to multiple covariates and interactions.</description>
    </item>
    <item>
      <title>Nonparametric Two-Way ANOVA</title>
      <link>https://www.cfholbert.com/blog/nonparametric_two_way_anova/</link>
      <pubDate>Sat, 16 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/nonparametric_two_way_anova/</guid>
      <description>In this post, we will evaluate whether sample depth and/or site location affect arsenic concentrations measured in soil. To address non-normality and heteroscedasticity, two-way ANOVA will be performed using the rank-transformation of the data values.</description>
    </item>
    <item>
      <title>Summarize Influent Flow Data Containing No Measurement Date</title>
      <link>https://www.cfholbert.com/blog/influent-flow/</link>
      <pubDate>Thu, 07 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/influent-flow/</guid>
      <description>This post summarizes influent flow data containing no measurement date for 10 wastewater treatment facilities.</description>
    </item>
    <item>
      <title>Mann-Kendall Power Analysis Revisited</title>
      <link>https://www.cfholbert.com/blog/mann-kendall-power-analysis-revisited/</link>
      <pubDate>Tue, 05 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/mann-kendall-power-analysis-revisited/</guid>
      <description>Detection of a long-term, temporal  trend in environmental data is affected by a number of factors, including the size of the trend to be detected, the time span of the data, and the magnitude of variability and autocorrelation of the noise in the data. This post evaluates the power of the Mann-Kendall test to identify a trend for various combinations of trend, variability, and sample size using Monte Carlo simulation.</description>
    </item>
    <item>
      <title>Plume Moment Analysis Using Thiessen Polygons</title>
      <link>https://www.cfholbert.com/blog/moment_analysis_thiessen_polygons/</link>
      <pubDate>Sat, 02 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/moment_analysis_thiessen_polygons/</guid>
      <description>Mass-based analyses of groundwater contaminants provide complementary information not readily quantified using single-well analytics. This post describes methods that can be used to evaluate contaminant concentrations measured in wells to determine how plume mass and plume center-of-mass change through time.</description>
    </item>
    <item>
      <title>Sample Size Requirement for One-Sample t-Test</title>
      <link>https://www.cfholbert.com/blog/sample_size_t-test/</link>
      <pubDate>Sat, 02 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/sample_size_t-test/</guid>
      <description>This post computes the sample size necessary to achieve a specified power for a one-sample t-test, given the ratio of means, coefficient of variation, and significance level. Calculations are based on the USEPA&amp;rsquo;s 1996 Soil Screening Guidance Document that discusses sample size calculations to determine whether soil at a potentially contaminated site needs to be investigated for possible remedial action.</description>
    </item>
    <item>
      <title>How to Calculate Summary Statistics for Left-Censored Data</title>
      <link>https://www.cfholbert.com/blog/summary-statistics-censored-data/</link>
      <pubDate>Mon, 28 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/summary-statistics-censored-data/</guid>
      <description>Left-censored environmental data are problematic because censored (nondetect) values are known only to range between zero and the detection or reporting limit. Fortunately, methods are available for analyzing data containing a mixture of detects and nondetects that make few or no assumptions about the data, or that substitute arbitrary values for the nondetects.</description>
    </item>
    <item>
      <title>Outlier Identification Using Mahalanobis Distance</title>
      <link>https://www.cfholbert.com/blog/outlier_mahalanobis_distance/</link>
      <pubDate>Sun, 27 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/outlier_mahalanobis_distance/</guid>
      <description>The Mahalanobis distance is a statistical technique that can be used to measure how distant a point is from the centroid of the data. Mahalanobis distances can be converted into probabilities using a chi-squared distribution. By specifying a significance level, this process is commonly used as an outlier detection method.</description>
    </item>
    <item>
      <title>U.S. Bureau of Labor Statistics Employment Situation Report Data Trends</title>
      <link>https://www.cfholbert.com/blog/jobs-report/</link>
      <pubDate>Wed, 16 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/jobs-report/</guid>
      <description>This post performs exploratory data analysis using time-series plots to visually identify essential economic information contained in the U.S. Bureau of Labor Statistics Employment Situation Report for February 2022.</description>
    </item>
    <item>
      <title>Groundwater Statistics Using trendMK</title>
      <link>https://www.cfholbert.com/blog/groundwater-statistics-with-trendmk/</link>
      <pubDate>Sun, 13 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/groundwater-statistics-with-trendmk/</guid>
      <description>This is a brief tutorial on using R and the &lt;em&gt;trendMK&lt;/em&gt; package for the statistical analysis of groundwater monitoring data. The &lt;em&gt;trendMK&lt;/em&gt; package is designed to analyze censored data sets containing many sampling locations and monitoring constituents.</description>
    </item>
    <item>
      <title>Optimizing a Long-Term Groundwater Monitoring Network Using Geostatistical Methods - Part 3</title>
      <link>https://www.cfholbert.com/blog/geospatial-optimization-part3/</link>
      <pubDate>Thu, 24 Feb 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/geospatial-optimization-part3/</guid>
      <description>Perform geospatial optimization of a groundwater monitoring well network with the objective of minimizing the number of sample locations needed to represent the plume, subject to the constraint that the characteristics of the plume remain comparable.</description>
    </item>
    <item>
      <title>Optimizing a Long-Term Groundwater Monitoring Network Using Geostatistical Methods - Part 2</title>
      <link>https://www.cfholbert.com/blog/geospatial-optimization-part2/</link>
      <pubDate>Tue, 22 Feb 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/geospatial-optimization-part2/</guid>
      <description>Use Sequential Gaussian Simulation (SGSIM) conditioned on measured groundwater concentration data to obtain quantitative measures of the uncertainty regarding the extent and severity of contamination at a site.</description>
    </item>
    <item>
      <title>Optimizing a Long-Term Groundwater Monitoring Network Using Geostatistical Methods - Part 1</title>
      <link>https://www.cfholbert.com/blog/geospatial-optimization-part1/</link>
      <pubDate>Tue, 18 Jan 2022 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/geospatial-optimization-part1/</guid>
      <description>Costs for groundwater monitoring represent a significant, persistent, and growing burden for environmental remediation projects. This post examines spatial optimization of a groundwater monitoring well network using a geostatistic approach to identify new well locations or redundant locations such that the operational value of the monitoring network is maximized.</description>
    </item>
    <item>
      <title>Comparison of Mann-Kendall Test to Akritas-Theil-Sen Regression for Data Containing Many Non-Detects</title>
      <link>https://www.cfholbert.com/blog/mann-kendall-ats-comparison/</link>
      <pubDate>Tue, 21 Dec 2021 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/mann-kendall-ats-comparison/</guid>
      <description>The Mann-Kendall test and Theil-Sen regression are typically used to detect temporal trend when analyzing environmental data. Both tests are affected by any value chosen to represent non-detects. Another method that is not affected by non-detects is Akritas-Theil-Sen nonparametric regression.</description>
    </item>
    <item>
      <title>Exploring Global Surface Temperature Change</title>
      <link>https://www.cfholbert.com/blog/surface-temp-change/</link>
      <pubDate>Mon, 19 Apr 2021 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/surface-temp-change/</guid>
      <description>Conduct an evaluation of surface temperature change using temperature measurements that have been collected at Kremsmünster Abbey in Austria, which is considered to be one of the highest quality, longest running, instrumental temperature records in the world.</description>
    </item>
    <item>
      <title>Landscape Pattern Analysis</title>
      <link>https://www.cfholbert.com/blog/landscape_pattern_analysis/</link>
      <pubDate>Mon, 05 Apr 2021 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/landscape_pattern_analysis/</guid>
      <description>Landscapes contain complex spatial patterns in the distribution of resources that vary over time. This post examines the spatial analysis of landscapes using base R functions complemented by contributed packages for spatial pattern analysis and for quantifying landscape characteristics.</description>
    </item>
    <item>
      <title>Temporal Optimization of Groundwater Sample Frequency Using Iterative Thinning</title>
      <link>https://www.cfholbert.com/blog/temporal-optimization/</link>
      <pubDate>Thu, 11 Mar 2021 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/temporal-optimization/</guid>
      <description>This post presents single-well iterative thinning, which is a method to determine an optimal sampling frequency for groundwater monitoring wells and monitoring constituents.</description>
    </item>
    <item>
      <title>Spatio-temporal Modeling of PM10 Concentration Using INLA-SPDE</title>
      <link>https://www.cfholbert.com/blog/spatio-temporal-pm10-inla-sdpe/</link>
      <pubDate>Wed, 20 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/spatio-temporal-pm10-inla-sdpe/</guid>
      <description>Use a stochastic partial differential equation (SPDE) approach with Integrated Nested Laplace Approximation (INLA) to model the spatio-temporal behavior of PM10 concentrations measured in the North-Italian region of Piemonte.</description>
    </item>
    <item>
      <title>Spatial Interpolation Using Integrated Nested Laplace Approximation</title>
      <link>https://www.cfholbert.com/blog/spatial-interpolation-inla-sdpe/</link>
      <pubDate>Tue, 19 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/spatial-interpolation-inla-sdpe/</guid>
      <description>The performance of Bayesian inference using a stochastic partial differential equation (SPDE) approach with Integrated Nested Laplace Approximation (INLA) for predicting zinc concentrations in soil at unsampled locations is compared with those obtained using kriging.</description>
    </item>
    <item>
      <title>Multivariate Analysis Using Data With Non-detects</title>
      <link>https://www.cfholbert.com/blog/multivariate-analysis-with-nondetects/</link>
      <pubDate>Wed, 02 Sep 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/multivariate-analysis-with-nondetects/</guid>
      <description>Multivariate statistical methods provide a means of exploring complex data sets for patterns and relationships from which hypotheses can be generated and subsequently tested. This post explores methods to manage non-detects when applying multivariate procedures to investigate (dis)similarities among data objects based on a set of descriptors.</description>
    </item>
    <item>
      <title>Area Resource Survey Using Generalized Random Tessellation Stratified Sampling</title>
      <link>https://www.cfholbert.com/blog/grts-sample-design/</link>
      <pubDate>Thu, 06 Aug 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/grts-sample-design/</guid>
      <description>Develope an unstratified, equal probability sample design and a stratified, equal probability sample design using Generalized Random Tessellation Stratified (GRTS) sampling.</description>
    </item>
    <item>
      <title>Univariate and Multivariate Time-Series Analysis</title>
      <link>https://www.cfholbert.com/blog/time-series-analysis/</link>
      <pubDate>Wed, 13 May 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/time-series-analysis/</guid>
      <description>Time-series analysis and forecasting is an important area of machine learning because many predictive learning problems involve a time component. This post examines time-series analysis using indoor air concentrations of trichloroethene and various explanatory varibles collected over time at a single location.</description>
    </item>
    <item>
      <title>Multiple Linear Regression with Shrinkage</title>
      <link>https://www.cfholbert.com/blog/multiple_linear_regression_with_shrinkage/</link>
      <pubDate>Tue, 28 Apr 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/multiple_linear_regression_with_shrinkage/</guid>
      <description>This post compares simple linear regression and multiple linear regression with and without shrinkage using an indoor air dataset consisting of trichloroethene concentrations and various explanatory variables, including radon concentration, temperature, barometric pressure, wind direction, and wind speed</description>
    </item>
    <item>
      <title>Linear Regression with Categorical Variables</title>
      <link>https://www.cfholbert.com/blog/categorical-encoding/</link>
      <pubDate>Sun, 05 Apr 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/categorical-encoding/</guid>
      <description>This post explores linear regression with one-hot encoding. For those datasets with many categorical variables and where the categorical variables in turn have many unique levels, the number of features can quickly escalate. In these cases, label/ordinal encoding or some other alternative should be explored.</description>
    </item>
    <item>
      <title>Exploring the COVID-19 Pandemic by Country</title>
      <link>https://www.cfholbert.com/blog/covid-19-pandemic-country/</link>
      <pubDate>Wed, 18 Mar 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/covid-19-pandemic-country/</guid>
      <description>With the rapid spread in the novel coronavirus across countries, the World Health Organisation and several countries have published latest results on the impact of COVID-19 over the past few months. The objective of this post is to demonstrate how visualization using the R programmin language helps to derive informative insights from data sources.</description>
    </item>
    <item>
      <title>Analyzing COVID-19 Outbreak in China</title>
      <link>https://www.cfholbert.com/blog/covid-19_modelling/</link>
      <pubDate>Sun, 15 Mar 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/covid-19_modelling/</guid>
      <description>This is a perfunctory exploration of the early transmission dynamics of coronavirus disease 2019 (COVID-19) in mainland China. The basic reproduction number and the per day infection mortality and recovery rates are estimated using a classic SIR compartmental model of communicable disease outbreaks.</description>
    </item>
    <item>
      <title>Feature Selection Methods for Machine Learning</title>
      <link>https://www.cfholbert.com/blog/feature-selection-machine_learning/</link>
      <pubDate>Fri, 14 Feb 2020 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/feature-selection-machine_learning/</guid>
      <description>Feature Selection is a core concept in machine (statistical) learning that can have significant impacts on model performance. This post examines various methods to identify the most important predictor variables in machine learning that explain the variance of the response variable.</description>
    </item>
    <item>
      <title>Creating Static Maps Using R</title>
      <link>https://www.cfholbert.com/blog/creating-maps-with-r/</link>
      <pubDate>Wed, 18 Sep 2019 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/creating-maps-with-r/</guid>
      <description>Use the functionality of R and R packages to create both simple maps and complex maps containing many different layers.</description>
    </item>
    <item>
      <title>Testing Group Differences with Data Containing Non-detects</title>
      <link>https://www.cfholbert.com/blog/group-comparisons-with-nondetects/</link>
      <pubDate>Fri, 13 Sep 2019 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/group-comparisons-with-nondetects/</guid>
      <description>Often data from more than two groups needs to be evaluated, usually on the basis of a representative value from each group. This post examines the use of survival analysis techniques to test whether surface water samples containing a high frequency of censored (non-detect) values differ in dissolved lead concentration between various watersheds.</description>
    </item>
    <item>
      <title>Outlier Detection Using Machine Learning</title>
      <link>https://www.cfholbert.com/blog/outlier-detection-machine-learning/</link>
      <pubDate>Mon, 09 Sep 2019 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/outlier-detection-machine-learning/</guid>
      <description>There is no precise way to define and identify outliers in general because of the specifics of each dataset. This post evaluates three methods for multivariate outlier detection, including Mahalanobis distance (a multivariate extension to standard univariate tests) and two machine learning (clustering) techniques.</description>
    </item>
    <item>
      <title>Introduction to Statistical Intervals</title>
      <link>https://www.cfholbert.com/blog/statistical-intervals/</link>
      <pubDate>Tue, 06 Aug 2019 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/statistical-intervals/</guid>
      <description>The issue of uncertainty in estimating population parameters from data samples is often addressed using statistical intervals. The three types of statistical interval differ in their definitions as well as their typical applications. It is important to fully understand the assumptions and limitations underlying the use, interpretation, and calculation of statistical intervals before applying them.</description>
    </item>
    <item>
      <title>Power of the Mann-Kendall Test</title>
      <link>https://www.cfholbert.com/blog/mann-kendall-power-analysis/</link>
      <pubDate>Sun, 21 Jul 2019 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/mann-kendall-power-analysis/</guid>
      <description>An important objective of many environmental monitoring programs is to detect changes or trends in constituent concentrations over time. The Mann-Kendell test is one of the most popular nonparametric tests for determining temporal trend. This post evaluates the power of the Mann-Kendall test to identify a trend for various sample sizes and variability in the data using Monte Carlo simulation.</description>
    </item>
    <item>
      <title>Fitting Distributions with Censored Data</title>
      <link>https://www.cfholbert.com/blog/fitting-distributions-censored-data/</link>
      <pubDate>Wed, 26 Jun 2019 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/fitting-distributions-censored-data/</guid>
      <description>Many statistical analyses depend on the type of data distribution. This post explores methods gooness-of-fit tests for the lognormal distribution, the gamma distribution, and normal distribution when data contain censored (non-detect) values.</description>
    </item>
    <item>
      <title>Censored Regression</title>
      <link>https://www.cfholbert.com/blog/censored-regression/</link>
      <pubDate>Sun, 03 Mar 2019 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/censored-regression/</guid>
      <description>Regression performed using censored data can be challenging. Common practices for handling censored data include deletion of the censored observations or substituting nondetects with arbitrary constants, generally based on some fraction of the detection limit. These approaches tend to be biased and cause a loss of information. Censored regression methods produce more accurate and robust estimates than these bias-prone methods.</description>
    </item>
    <item>
      <title>Robust Regression</title>
      <link>https://www.cfholbert.com/blog/robust-regression/</link>
      <pubDate>Mon, 30 Jul 2018 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/robust-regression/</guid>
      <description>Ordinary least squares regression is optimal when all regression assumptions are valid. When some of these assumptions are invalid, least squares regression can perform poorly. Robust regression is an alternative to least squares regression when data contain outliers or influential observations.</description>
    </item>
    <item>
      <title>Space Shuttle Challenger Disaster</title>
      <link>https://www.cfholbert.com/blog/challenger-disaster/</link>
      <pubDate>Mon, 25 Jun 2018 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/challenger-disaster/</guid>
      <description>On January 28, 1986 the space shuttle Challenger disintegrated 73 seconds after liftoff from Kennedy Space Center. The most disturbing part of the space shuttle Challenger disaster was that the O-ring failure had been foreseen by the manufacturer&amp;rsquo;s engineers, who were unable to convince managers to delay the launch. Providing a better analysis and visualization of the data could have helped improve the decision-making process and potentially built a stronger case for the engineers about the effect of cold weather on O-ring functionality.</description>
    </item>
    <item>
      <title>Problems Fitting a Nonlinear Model Using Log-Transformation</title>
      <link>https://www.cfholbert.com/blog/logtransform-nonlinear-regression/</link>
      <pubDate>Tue, 19 Jun 2018 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/logtransform-nonlinear-regression/</guid>
      <description>Power-law relationships are one of the most common patterns in the environmental. This post presents an example of the problems that can occur when fitting a nonlinear model by transforming to linearity using natural logarithms.</description>
    </item>
    <item>
      <title>Dixon&#39;s Outlier Test</title>
      <link>https://www.cfholbert.com/blog/dixons-outlier-test/</link>
      <pubDate>Tue, 05 Jun 2018 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/dixons-outlier-test/</guid>
      <description>Dixon&amp;rsquo;s test is simple, easy to understand, and is widely used in the scientific community. Data recorded to some specific measurement increment can become a problem for outlier tests, such as Dixon&amp;rsquo;s test. Dixon&amp;rsquo;s test assumes that the data values (aside from those being tested as potential outliers) are normally distributed. Most sample distributions are not normally distributed.</description>
    </item>
    <item>
      <title>Nonparametric Trend Analysis</title>
      <link>https://www.cfholbert.com/blog/nonparametric-trend-analysis/</link>
      <pubDate>Mon, 21 May 2018 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/nonparametric-trend-analysis/</guid>
      <description>Detection of temporal trends is one of the most important objectives of environmental monitoring. This post examines nonparametric temporal trend analysis using the Mann-Kendall test and the Theil-Sen regression estimator.</description>
    </item>
    <item>
      <title>How Robust Is the Two-Sample T-Test?</title>
      <link>https://www.cfholbert.com/blog/two-sample-t-test-robustness/</link>
      <pubDate>Sun, 13 May 2018 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/two-sample-t-test-robustness/</guid>
      <description>The most common activity in research is the comparison of two groups. The t-test is robust to departures from normally for moderate tailed, symmetric distributions. When the data come from a heavy tailed distribution, even one that is symmetric, the two-sample t-test may not perform as designed.</description>
    </item>
    <item>
      <title>How to Analyze Data Containing Non-detects</title>
      <link>https://www.cfholbert.com/blog/analyze-data-with-nondetects/</link>
      <pubDate>Sun, 06 May 2018 00:00:00 +0000</pubDate>
      <guid>https://www.cfholbert.com/blog/analyze-data-with-nondetects/</guid>
      <description>Management decisions are affected by left-censored observations because they impact not only the estimation of statistical parameters but also inferential statistics. This post presents statistically robust procedures to analyze censored data that make no assumptions or use of arbitrary values.</description>
    </item>
  </channel>
</rss>
