# installing
install.packages("tidyverse")
# or install specific packages
install.packages("dplyr")
install.packages("ggplot2")
# now import them into your session
library(tidyverse)
library(dplyr)
library(ggplot2)
Introduction to the tidyverse
Data Science flavoured R
Tidyverse
Tidyverse is a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Installation This can be done using 1) install.packages("tidyverse")
; 2) library(tidyverse)
to install and import all packages.
Installing tidyverse will result in (usually) a lot of libraries your code isn’t using being installed, which is not efficient. It is best practice (and helps your learning) to install packages individually. See below for popular tidyverse packages.
Popular Tidyverse packages
dplyr
- Solve the most common data manipulation challenges (NBdbplyr
allows you to use remote database tables by converting dplyr code to SQL)readr
- Read and write tabular data like csv and tsv formats. (NB there are options likereadxl
,writexl
for working with excel files andgooglesheets4
for Google sheets)stringr
- Set of functions designed to make working with strings as easy as possible. It also incorporates Reg Ex patterns into its syntax. Many common data cleaning and preparation tasks involve string cleaning such as detecting matches, sub-setting strings, mutating strings, ordering, …tidyr
- A set of functions to help tidy data (each column is a row, each row an observation, and each cell a single value).separate_wider_delim()
,hoist()
,pivot_longer()
, …ggplot2
- A declarative package for making graphics. See also R Graphics Cookbookpurrr
- Provides a complete set of tools for working with functions and vectors. (Themap()
family can efficiently replace for loops). A good place to start learning is here.
Note There are more packages than this. Some other helpful ones to know about include: httr
, lubridate
, glue
, modelr
, forcats
.
Installing & importing packages
Inspecting a dataset
An essential first step in any data analytical task is inspecting your data visually. Some packages come with dataset you can work with so you’ll want to see what they look like, or you can inspect your own data.
Toy datasets
It is useful to use toy datasets which come included when you install and import the relevant package. Some examples are:
mpg
from ggplot2starwars
from dplyrstorms
from dplyrband_members
from dplyr (a small dataset, it contains three tables useful for demonstrating joins)
library(dplyr)
# open ggplot2's data dictionary for this packages internal dataset
# help("mpg")
# load the dataset into a variable
<- ggplot2::mpg
df
# see information rich summary
# glimpse(df)
# see dimension of object
# (number of rows and columns)
# dim(df)
# see top n rows
%>% head(n = 5) df
- 1
-
We need this package so we can access the
%>%
‘pipe’ operator - 2
-
this notation tells R to look in the
ggplot2
package for the datasetmpg
- 3
-
You can also use
tail()
to see the bottom n rows. Usehead()
to default to the top 6 rows.
# A tibble: 5 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…