Fast, dependency free string literals

So what do we mean when we say that glue is fast. This does not mean glue is the fastest thing to use in all cases, however for the features it provides we can confidently say it is fast.

A good way to determine this is to compare it’s speed of execution to some alternatives.

• base::paste0(), base::sprintf() - Functions in base R implemented in C that provide variable insertion (but not interpolation).
• R.utils::gstring(), stringr::str_interp() - Provides a similar interface as glue, but using ${} to delimit blocks to interpolate. • pystr::pystr_format()1, rprintf::rprintf() - Provide a interfaces similar to python string formatters with variable replacement, but not arbitrary interpolation. ## Simple concatenation bar <- "baz" simple <- microbenchmark::microbenchmark( glue = glue::glue("foo{bar}"), gstring = R.utils::gstring("foo${bar}"),
paste0 = paste0("foo", bar),
sprintf = sprintf("foo%s", bar),
str_interp = stringr::str_interp("foo${bar}"), rprintf = rprintf::rprintf("foo$bar", bar = bar)
)

print(unit = "eps", order = "median", signif = 4, simple)
#> Unit: evaluations per second
#>        expr      min     lq   mean median     uq     max neval
#>     rprintf   259.30   1833   1897   1899   2001    2208   100
#>     gstring    18.74   2112   2212   2219   2359    2782   100
#>  str_interp   172.60   2673   2855   2914   3071    3732   100
#>        glue  1071.00   4990   5661   5540   6030    8709   100
#>      paste0 51750.00 241200 444100 443500 578900  914100   100
#>     sprintf 43740.00 346000 547800 485100 589600 1429000   100

plot_comparison(simple)

While glue() is slower than paste0,sprintf() it is twice as fast as str_interp() and gstring(), and on par with rprintf().

paste0(), sprintf() don’t do string interpolation and will likely always be significantly faster than glue, glue was never meant to be a direct replacement for them.

rprintf() does only variable interpolation, not arbitrary expressions, which was one of the explicit goals of writing glue.

So glue is ~2x as fast as the two functions (str_interp(), gstring()) which do have roughly equivalent functionality.

It also is still quite fast, with over 6000 evaluations per second on this machine.

## Vectorized performance

Taking advantage of glue’s vectorization is the best way to avoid performance. For instance the vectorized form of the previous benchmark is able to generate 100,000 strings in only 22ms with performance much closer to that of paste0() and sprintf(). NB. str_interp() does not support vectorization, so were removed.

bar <- rep("bar", 1e5)

vectorized <-
microbenchmark::microbenchmark(
glue = glue::glue("foo{bar}"),
gstring = R.utils::gstring("foo${bar}"), paste0 = paste0("foo", bar), sprintf = sprintf("foo%s", bar), rprintf = rprintf::rprintf("foo$bar", bar = bar)
)

print(unit = "ms", order = "median", signif = 4, vectorized)
#> Unit: milliseconds
#>     expr   min    lq  mean median    uq   max neval
#>  sprintf 13.43 13.53 13.87  13.58 13.90 18.20   100
#>   paste0 15.81 16.13 16.53  16.38 16.78 20.23   100
#>     glue 17.48 17.94 18.63  18.30 18.89 23.20   100
#>  gstring 34.20 34.66 35.43  34.98 35.65 41.80   100
#>  rprintf 53.34 53.66 54.54  54.04 54.53 60.94   100

plot_comparison(vectorized, log = FALSE)

1. pystr is no longer available from CRAN due to failure to correct installation errors and was therefore removed from further testing.