Writing scientific papers using Rstudio and Zotero

(This blog post is mainly for my future self, and for people that ask me about my workflow for scientific papers. Please contact me if you spot ways to improve this!)

This blog post describes a sequence of 9 steps to set up a reproducible workflow for scientific writing with a focus on getting the journal citation hell right. It boils down to writing the manuscript in Rmarkdown, and using a set of auxiliary tools to manage citations and output to Word to share with collaborators and to prepare the final document for submission to the journal.

Step 1: Choose the target journal

First step is to check what the constraints of the journal are where we want to submit. Do they have a particular word count? Do they have a particular format for the abstract? etc. Go to the target journal, and download the author instructions. Because author instructions are typically REALLY boring to read, a quick visual way to find out what is needed is to download a few recent (because change is the only constant), open access (no fuss with paywalls etc), and highly cited (this must mean they did something right, right?) example papers.

Step 2: Install reference manager Zotero

Zotero is THE open source reference manager. Source code on Github, check! Cross-platform, check! 4K stars on Github, check! Another cool feature is that your references library is stored online, and your local Zotero install synchronizes with it, so you no longer need to move around library files between work and home, laptop and desktop.

Go to the Zotero website to download, install and configure Zotero. (This includes creating a Zotero account for online storage of your references)

Step 3: Save citations to Zotero from your browser

The easiest way to fill your Zotero library is to use a browser plugin. I use Firefox on Linux, so I installed the Zotero Firefox connector. Once this is installed, I use Google scholar to look up papers I want to cite. To add a paper to the Zotero library, make sure to have Zotero open, then click on a paper in Google Scholar to go to that paper’s web site (Typically the Publishers website for the journal). Finally, click the icon in the browsers top right top corner (“Save to zotero”). Repeat ad nauseam.

Step 4: Create stable citation keys

While writing our papers, we want short but recognizable identifier keys for our citations. For this we use the “Better BibTex for Zotero” Add-on. Go to the Better Bibtex website and follow the installation instructions.

In the Zotero preferences (Better bibtex tab), I changed the “Citation Key Format” to create keys like verhoeven_etal20.

[auth.etal:lower:replace=.,_][>0][shortyear]|[veryshorttitle][shortyear]

(This snippet comes from Dewey Dunnington’s blog that was a big help in getting my workflow up and running)

This should automatically create / update all the citations keys in Zotero.

Step 5: Rstudio, Rmarkdown and all that

Rstudio is where we actually write the paper. We use the .Rmd Rmarkdown format. This format consists of a YAML header, followed by a body that consists of Markdown formatted text with optional code chunks, figures and tables interspersed. Version control is through Git.

Important thing to check: make sure that your .Rmd file uses UTF-8 encoding. In Rstudio, File --> Save with Encoding --> UTF-8 can set this straight if somehow you ended up with a file in the wrong encoding.

Check the Rstudio website for more info on Rmarkdown.

Note The latest version of Rstudio (1.4) contains a new editing mode for Rmarkdown, “the visual markdown editor”, that contains support for inserting citations from Zotero. I am not sure yet whether I like this, and noted that on my system, it was still buggy, and the editor, when invoked, makes CHANGES to the markdown code. Brrr. Therefore, this blog post does not make use of this new feature.

Step 6: Connecting Rstudio to Zotero

Hold on, almost there. We’re in Rstudio, and writing our paper. Now comes the moment where we want to cite something! Now we need a connection to Zotero. There are two Rstudio Addins that compete for this functionality, citr and rbbt. Both packages are not on CRAN and therefore need to be installed from Github. I tried them both out, and went for rbbt as citr does not support CSL-JSON and rbbt appears slightly leaner.

remotes::install_github("paleolimbot/rbbt")

After installing and restarting Rstdio, the rbbt addin can be found under Addins. Now since citing stuff is a common activity while writing a paper, we want a keyboard shortcut for this. I put it under CTRL + K where K stand for errr, Knowledge ?

To bind rbbt to a particular keyboard shortcut, do the following: First, in RStudio, choose Tools --> Modify Keyboard Shortcuts. Type zotero to filter out the Zotero plugin. Click on the ‘Shortcut’ field for the ‘Insert Zotero citation’ addin row, and type the desired shortcut keys.

In Rstudio, we can now press CTRL+K, type the name of the first author, select the citation, press enter, and have the citation key added to our .Rmd document.

Step 7: Creating the bibliography

Now that we have an Rmarkdown document filled with citation keys that references citations in Zotero, we still need one more thing, and that is to create the actual .bib or .json file containing the cited references.

Here I describe the simplest approach. We go to Zotero and export all the references using “Export Collection” , and choosing CSL JSON, save the file as references.json in the same folder as your Rmd paper. CSL-JSON is an emerging standard that is recommended by Yihui Xie, author of Rmarkdown.

In Rstudio, we add to following line to our YAML header:

bibliography: references.json

rbbt has the functionality to automatically create a bib/json file that ONLY contains the references that are cited in the Rmd document. I haven’t tested this yet, but this would be the icing on the cake. Instructions for this are on the rbbt Github page.

Update 03/2023 You can try adding the following entry to your YAML header:

bibliography: "`r rbbt::bbt_write_bib('bibliography.json', overwrite = TRUE)`" 

Step 8: Getting the references in the proper format

At this point, we can knit our Rmarkdown document, and it will contain the cited references appended at the end of the HTML/PDF/Word generated output document.

However, the references are (most likely) not yet properly formatted for the journal we want to send it to. For example, the journal “Health Services Research” wants the references in the main text to be numbered, and the reference list sorted in the order of appearance, and formatted according the APA format (whatever that is).

Luckily for us, enter the Citation Style Language project. They created a common standard, CSL, and a crowdsourced repository, that contains more than 10.000 free citation styles. All we need to do is grab the CSL file for our target journal!

Go to the Zotero Style Repository , search for the target journal name (in my case Health Services Research) and click on the result. This downloads a CSL file that we add to our Git repo containing the manuscript.

Step 9: Make Rstudio output to Word for our collaborators

Still here? Great. Now we are ready for the final step. This one is for our collaborators (who we feel sorry for, because they use Word and miss out on all the Rmarkdown fun), and for those journals, that force us to submit our manuscript as a Word file.

In Rstudio, we add the following code to the YAML header:

output:
  word_document:
    reference_docx: style_template.docx

This tells Rmarkdown to use the Word formatting styles contained in the style_template.docx file. For me, this contains at the moment three things: A4 page size, double line spacing, and numbered lines.

Follow the instructions by Rstudio to make this template. In short, you let Rstudio’s pandoc generate a Word document from a .Rmd file, and tweak the formatting styles of that document. Name the document style_template.docx and keep it with your .Rmd manuscript. I can confirm that this also works when you edit this document using LibreOffice / OpenOffice.

The great thing for me: now I have this blog post, I can forget about all this stuff, and finally get to the scientific writing part!

N.b. see also this repository from Kenneth Rioja with an operationalisation of the workflow described above: https://github.com/kennethrioja/rmdzoteroword_workflow

Avatar
Gertjan Verhoeven
Data Scientist / Policy Advisor

Gertjan Verhoeven is a research scientist currently at the Dutch Healthcare Authority, working on health policy and statistical methods. Follow me on Twitter or Mastodon to receive updates on new blog posts. Statistics posts using R are featured on R-Bloggers.