Posts

Statistical computing on a shoestring: Stan in the Azure cloud using Cloud-init

Automating deployment of an inexpensive Linux R/Stan development environment in the Azure public cloud.

Optimal performance with Random Forests: does feature selection beat tuning?

This blog post demonstrates that the presence of irrelevant variables can reduce the performance of the Random Forest algorithm (as implemented in R by ranger()). The solution is either to tune one of the algorithm’s parameters, OR to remove irrelevant features using a procedure called Recursive Feature Elimination (RFE).

OpenJDK and IcedTea: Java Web Start Forensics on Ubuntu

To play Blood Bowl online on FUMBBL.com, a Java client is used that works with Java Web Start. On my Ubuntu linux system, open source versions of java and java web start (openJDK and IcedTea) take care of this. This post describes my suffering caused by the client not working anymore after a Ubuntu software update, and might be helpful for others encountering the same issues.

Using R to analyse the Roche Antigen Rapid Test: How accurate is it?

This blog post is about the Roche Rapid Antigen Test Nasal. How accurate is it? I tracked down the data mentioned in the kit’s leaflet, discuss the whole measurement process and used R to reproduce the sensitivity and specificity of the test.

Writing scientific papers using Rstudio and Zotero

This blog post describes a sequence of 9 steps to set up a reproducible workflow for scientific writing based on open-source tooling. It boils down to writing the manuscript in Rmarkdown, and using a set of auxiliary tools to manage citations and output to Word to share with collaborators and to prepare the final document for submission to the journal.

OpenAI Gym's FrozenLake: Converging on the true Q-values

This blog post concerns a famous toy problem in Reinforcement Learning, the FrozenLake environment. We compare solving an environment with RL by reaching maximum performance versus obtaining the true state-action values $Q_{s,a}$.

Jacks Car Rental as a Gym Environment

In this blogpost, we solve a famous sequential decision problem called Jacks Car Rental by first turning it into a Gym environment and then use a RL algorithm called Policy Iteration (a form of Dynamic Programming) to solve for the optimal decisions to take in this environment.

Using posterior predictive distributions to get the Average Treatment Effect (ATE) with uncertainty

Here we show how to use Stan and the brms R-package to calculate the posterior predictive distribution of a covariate-adjusted average treatment effect (ATE).

Building TensorFlow 2.2 on an old PC

With the commoditization of deep learning in the form of Keras, I felt it was about time that I jumped on the Deep Learning bandwagon.

Simulating Fake Data in R

This blog post is on simulating fake data using the R package simstudy. Motivation comes from my interest in converting real datasets into synthetic ones.