Classifying Blood Bowl teams using clustered heatmaps

Restate my assumptions: If you graph the numbers of any system, patterns emerge. In this post we'll use clustered heatmaps to graph the numbers from the Blood Bowl Fantasy football game, and see what patterns emerge!

Optimal performance with Random Forests: does feature selection beat tuning?

This blog post demonstrates that the presence of irrelevant variables can reduce the performance of the Random Forest algorithm (as implemented in R by `ranger()`). The solution is either to tune one of the algorithm's parameters, OR to remove irrelevant features using a procedure called Recursive Feature Elimination (RFE).

Simulating Fake Data in R

This blog post is on simulating fake data using the R package [simstudy](https://www.rdatagen.net/page/simstudy/). Motivation comes from my interest in converting real datasets into synthetic ones.

Exploring Process Mining in R

In this post, we'll explore the BupaR suite of Process Mining packages created by Gert Janssenswillen of Hasselt University.

Designing an introductory course on Causal Inference

In this blog post, I describe the introductory course on Causal Inference I pieced together using various materials available online. It combines Pearl's Causal Graph approach with statistics Gelman/mcElreath style.

The validation set approach in caret

In this blog post, we explore how to implement the validation set approach in caret. This is the most basic form of the train/test machine learning concept.