Subsetting using the tidyverse
You can also subset tibbles
using tidyverse functions from package dplyr
. dplyr
verbs are inspired by SQL vocabulary and designed to be more intuitive.
- Tidyverse include dplyr, tidyr, and ggplot2, which are among the most popular R packages. There are others that are super useful like readxl, forcats, and stringr that are part of the tidyverse, but don't come installed automatically with the tidyverse package, so you'll have to lead them explicitly.
- Data Transformation with dplyr:: CHEAT SHEET A B C A B C select(.data.
The first argument of the main dplyr
functions is a tibble
(or data.frame)
Filtering rows with filter()
filter()
allows us to subset observations (rows) based on their values. The first argument is the name of the data frame. The second and subsequent arguments are the expressions that filter the data frame.
If you’re using R to do data analysis inside a company, most of the data you need probably already lives in a database (it’s just a matter of figuring out which one!). However, you will learn how to load data in to a local database in order to demonstrate dplyr’s database tools. Dbplyr is a part of the tidyverse.
dplyr
executes the filtering operation by generating a logical vector and returns a new tibble
of the rows that match the filtering conditions. You can therefore use any logical operators we learnt using [
.
Slicing rows with slice()
Using slice()
is similar to subsetting using element indices in that we provide element indices to select rows.
Selecting columns with select()
select()
allows us to subset columns in tibbles using operations based on the names of the variables.
In dplyr
we use unquoted column names (ie Volume
rather than 'Volume'
).
R Tidyverse Cheat Sheet Pdf
Behind the scenes, select
matches any variable arguments to column names creating a vector of column indices. This is then used to subset the tibble
. As such we can create ranges of variables using their names and :
R Tidyverse Cheat Sheet Pdf
There’s also a number of helper functions to make selections easier. For example, we can use one_of()
to provide a character vector of column names to select.