Machine learning

Optimal performance with Random Forests: does feature selection beat tuning?

This blog post demonstrates that the presence of irrelevant variables can reduce the performance of the Random Forest algorithm (as implemented in R by `ranger()`). The solution is either to tune one of the algorithm's parameters, OR to remove irrelevant features using a procedure called Recursive Feature Elimination (RFE).

OpenAI Gym's FrozenLake: Converging on the true Q-values

This blog post concerns a famous toy problem in Reinforcement Learning, the [FrozenLake environment](https://gym.openai.com/envs/FrozenLake-v0/). We compare solving an environment with RL by reaching **maximum performance** versus obtaining the **true state-action values** $Q_{s,a}$.

Jacks Car Rental as a Gym Environment

In this blogpost, we solve a famous sequential decision problem called Jacks Car Rental by first turning it into a Gym environment and then use a RL algorithm called Policy Iteration (a form of Dynamic Programming) to solve for the optimal decisions to take in this environment.

Building TensorFlow 2.2 on an old PC

With the commoditization of deep learning in the form of Keras, I felt it was about time that I jumped on the Deep Learning bandwagon.

The validation set approach in caret

In this blog post, we explore how to implement the validation set approach in caret. This is the most basic form of the train/test machine learning concept.

BART vs Causal forests showdown

In this post, we test both `Bayesian Additive Regression Trees (BART)` and `Causal forests (grf)` on four simulated datasets of increasing complexity. May the best method win!

Improving a parametric regression model using machine learning

In this post, I explore how we can improve a parametric regression model by comparing its predictions to those of a Random Forest model. This might informs us in what ways the OLS model fails to capture all non-linearities and interactions between the predictors.