Blog

Data analytics, statistics, and more

Analyses of Pesticide Concentrations in California Surface Waters

Environmental data are frequently left-censored, indicating that some values are less than the limit of quantification for the analytical methods employed. These data are problematic because censored (non-detect) values are known only to range between zero and the censoring limit. This complicates analysis of the data, including estimating statistical parameters, characterizing data distributions, and conducting inferential statistics. This post demonstrates various procedures and methods that are available in R for analyzing data containing a mixture of detects and non-detects. These methods make few or no assumptions about the data, or substitute arbitrary values (e.g., one-half the detection or reporting limit) for the non-detects.

September 16, 2023

Clustering on Principal Component Analysis

Combining principal component analysis (PCA) and clustering methods are useful for reducing the dimension of a data set into a few continuous variables containing the most important information in the data. This post illustrates how to combine PCA and clustering methods to identify patterns in a data set using the R language for statistical computing and visualization.

August 20, 2023

Exploratory Spatial Data Analysis and Kriging in R

This post presents and demonstrates several methods for exploratory spatial data analysis using the R language for statistical computing and visualization. These methods can be used for identifying spatial dependence patterns and spatial heterogeneity, which are critical components of variogram development and the kriging procedure.

May 29, 2023

Natural Neighbor Interpolation With R

This post presents and demonstrates several methods for natural neighbor interpolation using the R language for statistical computing and visualization. The results are compared to those obtain using ordinary kriging.

May 22, 2023

Calculation of 95% Upper Confidence Limit for Left-Censored Data

This post presents methods that can be used to calculate a 95% upper confidence limit on the mean of an unknown population for left-censored data sets (i.e., containing a mixture of detects and non-detects). The preferred approach depends on many factors, including the number of samples and the distributional shape of the data.

May 10, 2023