library(glue)
library(ggplot2)
library(bench)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Glue is advertised as
Fast, dependency free string literals
So what do we mean when we say that glue is fast? This does not mean glue is the fastest thing to use in all cases, however for the features it provides we can confidently say it is fast.
A good way to determine this is to compare its speed of execution to some alternatives.
-
base::paste0()
,base::sprintf()
: Functions in base R implemented in C that provide variable insertion (but not interpolation). -
R.utils::gstring()
: Provides a similar interface as glue, but uses${}
to delimit blocks to interpolate. -
pystr::pystr_format()
1,rprintf::rprintf()
: Provide an interface similar to python string formatters with variable replacement, but not arbitrary interpolation.
Note: stringr::str_interp()
was previously included in
this benchmark, but is now formally marked as “superseded”, in favor of
stringr::str_glue()
, which just calls
glue::glue()
.
Simple concatenation
bar <- "baz"
simple <- bench::mark(
glue = as.character(glue::glue("foo{bar}")),
gstring = R.utils::gstring("foo${bar}"),
paste0 = paste0("foo", bar),
sprintf = sprintf("foo%s", bar),
rprintf = rprintf::rprintf("foo$bar", bar = bar)
)
simple %>%
select(expression:total_time) %>%
arrange(median)
#> # A tibble: 5 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sprintf 750.9ns 861.01ns 1113764. 0B 0
#> 2 paste0 1.5µs 1.63µs 581551. 0B 58.2
#> 3 glue 97.7µs 103.77µs 9419. 139.5KB 32.0
#> 4 gstring 227.1µs 237.15µs 4122. 2.45MB 19.1
#> 5 rprintf 278.4µs 285.8µs 3442. 78.14KB 10.3
# plotting function defined in a hidden chunk
plot_comparison(simple)
While glue()
is slower than paste0
and
sprintf()
, it is twice as fast as gstring()
,
and rprintf()
.
Although paste0()
and sprintf()
don’t do
string interpolation and will likely always be significantly faster than
glue, glue was never meant to be a direct replacement for them.
rprintf::rprintf()
does only variable interpolation, not
arbitrary expressions, which was one of the explicit goals of writing
glue.
So glue is ~2x as fast as the function (gstring()
),
which has roughly equivalent functionality.
It also is still quite fast, with over 8000 evaluations per second on this machine.
Vectorized performance
Taking advantage of glue’s vectorization is the best way to improve
performance. In a vectorized form of the previous benchmark, glue’s
performance is much closer to that of paste0()
and
sprintf()
.
bar <- rep("bar", 1e5)
vectorized <- bench::mark(
glue = as.character(glue::glue("foo{bar}")),
gstring = R.utils::gstring("foo${bar}"),
paste0 = paste0("foo", bar),
sprintf = sprintf("foo%s", bar),
rprintf = rprintf::rprintf("foo$bar", bar = bar)
)
vectorized %>%
select(expression:total_time) %>%
arrange(median)
#> # A tibble: 5 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 paste0 8.32ms 8.34ms 119. 781.3KB 6.40
#> 2 sprintf 9.62ms 9.88ms 101. 781.3KB 4.21
#> 3 gstring 11.19ms 11.27ms 88.7 1.53MB 6.49
#> 4 glue 12.39ms 12.78ms 78.9 2.29MB 9.02
#> 5 rprintf 28.56ms 28.93ms 33.4 3.05MB 2.09
# plotting function defined in a hidden chunk
plot_comparison(vectorized)