Writing Better R Functions

As I’ve been working on writing more robust, informative functions in R, there have been a number of minor technical details that I keep forgetting from problems that I had already previously solved. Instead of keeping these solutions in a script somewhere, I’m uploading them here so I can find them more easily.

Find installed packages

One of the “meta-problems” that applies to all solutions on this page is figuring out which packages users have installed. I don’t want to make people install every package that I have, but I also don’t want to program everything in base R. I generally assume that anyone using my code will have the tidyverse installed, but there are many other packages that I may have installed but other people won’t (especially if we’re using different versions of R)1.

The way to get around this for more trivial packages is to change an argument (eg verbose = TRUE) based on the availability of packages. For example, I like using the crayon package to print colorful message to the console when a function is verbose.

f <- function(x, verbose=TRUE){
  # only run if verbose
  if(verbose){
    ## crayon_installed inherited from global env 
    if(crayon_installed) message("Here's a ", crayon::green$bold("fun bolded"), " message")
    else                 message("Here is the boring message")
  }
  
  mtcars
}

crayon_installed <- TRUE
f(mtcars)

crayon_installed <- FALSE
f(mtcars)
Output of above code

Output of above code

However, this will throw an error if crayon isn’t installed, so we need some way to flag enviroments where crayon isn’t installed. The old way I was doing it based on StackOverflow was (a) clunky and (b) prone to error. Using require(pkg) loads the package into the environment (potentially causing masking issues), and installed.packages() is slow & has false negatives2.

One way around this is to use rlang::is_installed(). Instead of writing two seperate messaging systems (one with crayon and one with out) like in the code above, I’ve moved to silencing verbose with a warning if the dependent packages aren’t installed

f <- function(x, verbose=TRUE, dependency = "crayon"){
  
  # only run if verbose and crayon not installed
  if(verbose && !rlang::is_installed(dependency)){
    warning("The package `", dependency, "` is not installed. Defaulting to verbose = FALSE")
    verbose <- FALSE
  }
  
  # Now give message if verbose (and crayon is installed)
  if(verbose) message("Here's a ", crayon::green$bold("fun bolded"), " message")
  
  nrow(mtcars)
}

f(mtcars) # This works (if crayon is installed)
## Here's a fun bolded message
## [1] 32

f(mtcars, dependency = "some_other_package") # shows warning
## Warning in f(mtcars, dependency = "some_other_package"): The package
## `some_other_package` is not installed. Defaulting to verbose = FALSE
## [1] 32

f(mtcars, verbose = F, dependency = "ggplot4") # ignores package matching if not verbose
## [1] 32

Passing the named argument back

Sometimes it’s useful to pass the name of an argument passed to a function in a message. For example consider the function below that takes two data.frames (smaller and larger) and returns the difference in the number or rows between them. If more rows exist in smaller than larger, it gives a message informing the user that absolute values are being used:

row_diff <- function(smaller, larger){
  nrow_smaller  <- nrow(smaller)
  nrow_larger <- nrow(larger)
  
  if(nrow_smaller > nrow_larger) 
    message("There are more rows in `smaller` than in `larger`. Using absolute vales")
  abs(nrow_smaller - nrow_larger)
}

row_diff(smaller = iris, larger = mtcars)
## There are more rows in `smaller` than in `larger`. Using absolute vales
## [1] 118

It would be nice if the message said “There are more rows in iris than in mtcars” because that’s more informative. The quasiquotation here gets kind of weird3, as shown by the lobstr output:

lobstr::ast(row_diff(smaller = iris, larger = mtcars))
## █─row_diff 
## ├─smaller = iris 
## └─larger = mtcars

Quoting these arguments with rlang::enquo doesn’t fix the problem, so instead you need to use substitute and rlang::expr_label

row_diff <- function(smaller, larger){
  nrow_smaller  <- nrow(smaller)
  nrow_larger <- nrow(larger)
  
  msg <- paste0("There are more rows in ",
                rlang::expr_label(substitute(smaller)),
                " than in ",
                rlang::expr_label(substitute(larger)),
                ". Using absolute vales")
  
  if(nrow_smaller > nrow_larger) message(msg)
    
  abs(nrow_smaller - nrow_larger)
}

row_diff(smaller = iris, larger = mtcars)
## There are more rows in `iris` than in `mtcars`. Using absolute vales
## [1] 118

Command Line Interfaces

I’m a big fan of using color in the console, and I already mentioned the crayon package in an earlier section. And as much as I love crayon, I still remained envious of the beautiful load message that you get when you load the tidyverse

Load message for tidyverse (on 2020-06-24)

Load message for tidyverse (on 2020-06-24)

So I spent almost an hour reading though all the tidyverse packages in hopes of figuring out how exactly they accomplished this magical feat, and came across the cli package. This package can be used for more robust applications, like printing better error or warning messages, nut here’s a simple example that lets you preview the columns of each table in the National Trauma DataBank

db <- DBI::dbConnect(RSQLite::SQLite(), dbname = "ntdb.db")

library(tidyverse)
library(dbplyr)
library(sqldf)

db_list_tables(db) %>% str_subset("2015_") %>% 
  walk(function(x){
    cli::cli_h1(x)
    glimpse(tbl(db, x))
  })

Trying paged tables

Test to see if this works on the website!

rmarkdown::paged_table(mtcars)

It doesn’t :(

Opening files in RStudio

I always forget this, but if you want to open a file (e.g. a text file) within RStudio, you can use the rstudioapi::navigateToFile() command. For example, if you’re trying to troubleshoot some text output, you can do the following:

library(easyPubMed)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

# Scrape Pubmed's metadata from my first publication (PMID: 33070947)
results <- fetch_pubmed_data(get_pubmed_ids("33070947"), format = "xml")

# Write the XMl to a temp file
out_file <- str_glue("{tempdir()}/pubmed.xml")
results %>% readr::write_file(out_file)

# If running in rstudio, open up the xml file
if(rstudioapi::isAvailable()){
  rstudioapi::navigateToFile(out_file)
}

# Bonus: here's how to explore XML structure (not run)
## out_file %>%
##   xml2::read_xml() %>%
##   xml2::xml_find_first("PubmedArticle") %>%
##   xml2::xml_find_first("MedlineCitation") %>%
##   xml2::xml_structure()

  1. This is also not always possible. For example some packages will be able to be installed on my MacBook but not my cloud server, so resolving discrepancies in the available packages also is required for work that I do for myself (that isn’t shared with others)

  2. see ?find.package

  3. Because functions use pairlists, see advanced R chapter

Hunter Ratliff, MD, MPH
Hunter Ratliff, MD, MPH
Infectious Diseases Fellow

My research interests include epidemiology, social determinants of health, and reproducible research.