Two Key Drawbacks of Writing Your Academic Manuscript With R Markdown
A decade ago it was revealed that research practices in psychology were often suboptimal for producing high quality evidence (Nosek et al. 2016). Researchers frequently ran studies with low statistical power (Schmidt and Oh 2016), and engaged in questionable research practices that ultimately rendered p values useless for appraising an effects presence (Schmidt and Oh 2016). Various practices including preregistration (Nosek et al. 2018) and recruiting more participants (Blake and Gangestad 2020) were proposed to tackle this issue, and are thankfully now commonplace.
One valuable proposal made for the improvement of psychological science is that researchers should leverage R Markdown to write their manuscripts (Baumer and Udwin 2015). R Markdown combines prose and data-wrangling/analyses (via R code) in one place, thus reducing the potential for human error (researchers that copy statistics over from statistical platforms to their manuscript often make mistakes, Nuijten et al. 2016). R Markdown can also facilitate reproducibility by making transparent data analytic and wrangling steps preceding the reported output (helping to answer the question ‘what did these researchers do with the raw data to arrive at that result?’), and is often promoted to accelerate the research process because following code updates (e.g., when outliers are removed) all dependent statistics in the manuscript automatically update.
Despite the stated benefits of using R Markdown to write an academic manuscript (a more comprehensive overview of these benefits is available here, here, and here), there are also drawbacks. The aim of the present article is to shed light on what I regard are the two greatest drawbacks to using R Markdown. I hope this article will help researchers better determine if now is the right to make the transition to R Markdown.
R Markdown may not save you time writing your manuscript
A commonly reported benefit of using R Markdown is that once a researcher knows how to use it and harness its functionality it can save them much time (Perkel 2022). For instance, if during peer review a reviewer asks a researcher to re-analyze their data without specified outliers, and they then remove these, it may be claimed that R Markdown will save the researcher time as all dependent study statistics will automatically update, saving them from laboriously going through their manuscript updating tables and figures, participant descriptives, inferential statistics, etc.
However, this claim is a misleading, as in many instances and for many researchers using R Markdown will prolong the writing process considerably. For instance, if a researcher has a very long RMD script with many code chunks (this will generally be the case if there is extensive exploratory analyses conducted), it can take minutes each time to render the script into a more aesthetically pleasing output like PDF. For researchers that do not need to render their RMD script frequently, waiting a few minutes every now and again is not a problem. However, for researchers that do need to render their script frequently, for instance, for aesthetic reasons on the reviewing/reading side, or because they need to engage in trial and error with their code (such as updating code to perfect the formatting of a figure, and then rendering to see if it worked) - which will arguably be the large majority of cases - this can waste a lot of time. These delays will be particularly painful for researchers that only have short blocks of time available to make progress on their manuscript, something that is the rule rather than the exception.
It is possible to cache code chunks to speed up rendering of large scripts. However, this requires skill and is not always advisable. And while fine-tuning code will speed up rendering and in some cases is possible (e.g., creating loops), researchers new to R Markdown and R are unlikely to possess the know-how to do this. An effective work-around can be to conduct R code trial and error in a separate script, using only that code which is absolutely necessary in order to enable rendering (and then copying the finalized code back into the main script). However, this is itself slightly cumbersome. Sometimes it will require more work than persisting with re-rendering of the original script.
Another reason this saving-time claim is misleading is that it won’t always be easy for researchers to create the R objects needed for using inline code functionality. It may be easy for a researcher well versed in R Markdown basics to extract coefficients and p-values after running a simple linear regression and save them as an R object for in-line code. However, extracting all the necessary statistics from more complex analyses, e.g., structural equation models, multilevel/cross-classified models, network analyses, etc., can take time. Sometimes researchers will need to find and load additional R packages to provide these statistics (e.g., the mixedup package can provide variance-components output from a multilevel model), while in other instances they will need to develop a firm grasp of indexing so they know how to pull relevant statistics out of a table, matrix, list, etc., and save them (see here for an example). Compare this with using a word processor to write a manuscript, where a researcher only need type in what is reported in their statistical output.
Another challenge to this claim is that the proposed R Markdown benefit stated earlier - that statistics will be automatically updated following code changes - will not save a researcher much time in instances where the majority of their statistics reside in tables or figures. Tables and figures automatically generated following rendering are often insufficient formatting-wise to meet the quality requirements of journals. Thus, prior to submission to a journal a researcher will often need to manually create their tables and figures, inputting each individual statistic as they would had they simply opted to use a word processor. I’ve personally experienced this problem on a few occasions. After painstakingly navigating and trying various R packages in order to create automatically generated path and mediation models, I threw in the towel after recognizing their functionality and aesthetics fell short of journal requirements. In the end I fell back on Google Drawings to create these figures and copy them into the rendered Word version of my manuscript.
A final challenge to the stated claim notes that which is perhaps most frustrating - the frequency in which code errors occur when running the RMD script. Code errors are part and parcel of using R and thus R Markdown, and they will slow down even the most seasoned coders. However, for researchers that are not R/R Markdown experts or are unfamiliar with how to debug code errors, use version control platforms like Github, or write code in such a way so as to limit coding errors, its likely they will lose hours or even days to this issue. Now for researchers that already experience these problems using R, transitioning over to R Markdown to write their manuscript won’t be so problematic. However, for a researcher transitioning over from using statistical software like SPSS, that they are comfortable using and which rarely kicks out errors, a manuscript they would otherwise have put together in a month may easily take them two or three when using R Markdown.
Writing Collaboration will be More Challenging
Collaborating effectively when writing a manuscript is essential. Thankfully, word processors like Microsoft Word and Google Docs make collaboration painless. They allow collaborators to insert comments, update prose, identify the changes to the prose and who made them, and view the manuscript differently depending upon your current purpose.
R Markdown, on the other hand, is currently not user friendly with collaboration. If a researcher sends their RMD script to their collaborators to update/improve in RMD or a text editor, they will not be able to leverage any tracked changes functionality. Thus, collaborators will not be able to identify how their manuscript was updated from one version to the next, without going through a painstaking process of comparing manuscript versions side by side. The collaborator will also not have an easy/user friendly way to insert comments into the manuscript. A common approach seems to be to insert the comment directly into the text surrounded by agreed upon symbols (e.g., #). This works, however, it falls short in terms of user-friendliness relative to its word processor competitors.
While options are available to overcome these issues, for instance, all collaborators can leverage Github version control to gain tracked changes functionality, or can utilize the newly created Trackdown package such that manuscript collaboration takes place in Google Docs, they are not perfect solutions. For instance, if even one collaborator can’t get up to speed with Github or Google Docs, or perhaps more likely, displays resistance to change away from common alternatives like Microsoft Word, you may have to consider others options.