Posts

Jan 15, 2024 25 min read Linux, Statistics, Cloud Computing

Statistical computing on a shoestring: Stan in the Azure cloud using Cloud-init

Automating deployment of an inexpensive Linux R/Stan development environment in the Azure public cloud.

Jul 9, 2022 21 min read Statistics, Machine Learning

Optimal performance with Random Forests: does feature selection beat tuning?

This blog post demonstrates that the presence of irrelevant variables can reduce the performance of the Random Forest algorithm (as implemented in R by ranger()). The solution is either to tune one of the algorithm’s parameters, OR to remove irrelevant features using a procedure called Recursive Feature Elimination (RFE).

Jan 4, 2022 9 min read Linux, Blood Bowl

OpenJDK and IcedTea: Java Web Start Forensics on Ubuntu

To play Blood Bowl online on FUMBBL.com, a Java client is used that works with Java Web Start. On my Ubuntu linux system, open source versions of java and java web start (openJDK and IcedTea) take care of this. This post describes my suffering caused by the client not working anymore after a Ubuntu software update, and might be helpful for others encountering the same issues.

Jun 6, 2021 21 min read Statistics, Data science, Measurement

Using R to analyse the Roche Antigen Rapid Test: How accurate is it?

This blog post is about the Roche Rapid Antigen Test Nasal. How accurate is it? I tracked down the data mentioned in the kit’s leaflet, discuss the whole measurement process and used R to reproduce the sensitivity and specificity of the test.

May 2, 2021 7 min read Scientific writing

Writing scientific papers using Rstudio and Zotero

This blog post describes a sequence of 9 steps to set up a reproducible workflow for scientific writing based on open-source tooling. It boils down to writing the manuscript in Rmarkdown, and using a set of auxiliary tools to manage citations and output to Word to share with collaborators and to prepare the final document for submission to the journal.

Mar 7, 2021 21 min read Machine Learning

OpenAI Gym's FrozenLake: Converging on the true Q-values

This blog post concerns a famous toy problem in Reinforcement Learning, the FrozenLake environment. We compare solving an environment with RL by reaching maximum performance versus obtaining the true state-action values $Q_{s,a}$.

Dec 30, 2020 8 min read Machine Learning

Jacks Car Rental as a Gym Environment

In this blogpost, we solve a famous sequential decision problem called Jacks Car Rental by first turning it into a Gym environment and then use a RL algorithm called Policy Iteration (a form of Dynamic Programming) to solve for the optimal decisions to take in this environment.

Sep 4, 2020 10 min read Statistics