Blog

Data analytics, statistics, and more

Landscape Pattern Analysis

Landscapes contain complex spatial patterns in the distribution of resources that vary over time. This post examines the spatial analysis of landscapes using base R functions complemented by contributed packages for spatial pattern analysis and for quantifying landscape characteristics.

April 5, 2021

Spatial Interpolation Using Integrated Nested Laplace Approximation

The performance of Bayesian inference using a stochastic partial differential equation (SPDE) approach with Integrated Nested Laplace Approximation (INLA) for predicting zinc concentrations in soil at unsampled locations is compared with those obtained using kriging.

January 19, 2021

Multivariate Analysis Using Data With Non-detects

Multivariate statistical methods provide a means of exploring complex data sets for patterns and relationships from which hypotheses can be generated and subsequently tested. This post explores methods to manage non-detects when applying multivariate procedures to investigate (dis)similarities among data objects based on a set of descriptors.

September 2, 2020

Univariate and Multivariate Time-Series Analysis

Time-series analysis and forecasting is an important area of machine learning because many predictive learning problems involve a time component. This post examines time-series analysis using indoor air concentrations of trichloroethene and various explanatory varibles collected over time at a single location.

May 13, 2020

Multiple Linear Regression with Shrinkage

This post compares simple linear regression and multiple linear regression with and without shrinkage using an indoor air dataset consisting of trichloroethene concentrations and various explanatory variables, including radon concentration, temperature, barometric pressure, wind direction, and wind speed

April 28, 2020

Linear Regression with Categorical Variables

This post explores linear regression with one-hot encoding. For those datasets with many categorical variables and where the categorical variables in turn have many unique levels, the number of features can quickly escalate. In these cases, label/ordinal encoding or some other alternative should be explored.

April 5, 2020

Exploring the COVID-19 Pandemic by Country

With the rapid spread in the novel coronavirus across countries, the World Health Organisation and several countries have published latest results on the impact of COVID-19 over the past few months. The objective of this post is to demonstrate how visualization using the R programmin language helps to derive informative insights from data sources.

March 18, 2020