print(nrow(data))
<- filter(data, length > 5)
data print(nrow(data))
Tidylog 1.0.0
Before I became a heavy user of R, I mainly used Stata. There are a few things that I miss from Stata, but one issue, specifically, bothered me immensely: The lack of feedback for data wrangling operations in R. Have a look, for instance, at this Stata output:
The merge
operation tells us about the number of matched cases, and the drop
command tells us how many cases we lost. This feedback is great at preventing simple errors, especially when working with data interactively. This functionality does not exist in base R, the tidyverse, or the data.table package. Hence, my code often looked like this:
This gets ugly pretty quickly, and does not work for many other common problems, such as joins.
This is why I wrote the tidylog package, which is built on top of the tidyverse’s dplyr and tidyr packages. Tidylog provides the missing feedback:
library("tidyverse")
library("tidylog", warn.conflicts = FALSE)
<- filter(mtcars, cyl == 4)
filtered #> filter: removed 21 rows (66%), 11 rows remaining
<- left_join(nycflights13::flights, nycflights13::weather,
joined by = c("year", "month", "day", "origin", "hour", "time_hour"))
#> left_join: added 9 columns (temp, dewp, humid, wind_dir, wind_speed, …)
#> > rows only in x 1,556
#> > rows only in y ( 6,737)
#> > matched rows 335,220
#> > =========
#> > rows total 336,776
Tidylog simply overwrites the tidyverse functions for which it provides feedback. This is not very elegant, but means that tidylog is a drop-in solution: Just load it after the tidyverse (or dplyr and/or tidyr), and it will provide feedback.
Since its first version about a year ago, the package has grown to include most dplyr and many tidyr functions. (Thanks to all the contributors!) I might consider other functions, but it seems like for rarer and more complex functions the feedback becomes less useful, because one will usually inspect the output manually anyway. Because tidylog seems pretty much feature-complete to me, I release version 1.0.0 now. The goal for the future is to keep the package updated with developments occuring in dplyr and tidyr.
For more information about tidylog, check out the Github page.