Visualizing data with ggplot2

# 02 - Visualizing data with ggplot2

## Data Science with R &#183; Summer 2021

### Uli Niemann &#183; Knowledge Management & Discovery Lab

#### [https://brain.cs.uni-magdeburg.de/kmd/DataSciR/](https://brain.cs.uni-magdeburg.de/kmd/DataSciR/)

---

## Datasets

In `R` most datasets come in the form of data frames:

- Each row is an **observation**.
- Each column is a **variable**.

```r
library(gapminder)
gapminder
```

```
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows
```

???

"Gapminder" dataset which
contains global health and economic data for 142 countries between 1952 and 2007
in increments of 5 years.

---

## Example: Germany in 2007

![](figures/02-flag-germany.svg)

]

- `country = "Germany"`
- `continent = "Europe"`
- `year = 2007`
- `lifeExp = 79.4` years
- `pop = 82400996` inhabitants
- `gdpPercap = 32170` USD

```
## # A tibble: 1 x 6
##   country continent  year lifeExp      pop gdpPercap
##   <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Germany Europe     2007    79.4 82400996    32170.
```

]

---

## What's in the Gapminder data?

- How many rows and columns does this dataset contain?  
- What does each row represent?  
- What does each column represent?

]

Take a `glimpse()` at the data:

```r
library(dplyr)
glimpse(gapminder)
```

```
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist~
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia~
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, 2002, 2007~
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.822, 41.674~
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12881816, 13~
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, 978.0114, ~
```

---

## Consulting the dataset documentation

```r
?gapminder
# alternative: place cursor within `gapminder` and press F1
```

---

```r
nrow(gapminder) # number of rows
```

```
## [1] 1704
```

```r
ncol(gapminder) # number of columns
```

```
## [1] 6
```

```r
dim(gapminder) # dimensions (row column)
```

```
## [1] 1704    6
```

---

## Why visualize data?

Visualization is part of...

- **Exploratory data analysis**: understand distributions, identify outliers & missing data
- **Feature engineering**: discover relationships between two or more predictors and extract a new predictor to increase model performance
- **Model presentation**: show clusters, dimension reductions, etc.
- **Model evaluation**: graphically describe the performance of one or 
more inferential or predictive models
- **Storytelling**: convincingly communicate a data-driven finding

]

.font80[
Figure source: Kieran Healy. ["Data Visualization. A practical introduction"](http://socviz.co/). Princeton University Press, 2018.
]

]

---

## Anscombe's quartett

```
##    group  x     y
## 1      1 10  8.04
## 2      1  8  6.95
## 3      1 13  7.58
## 4      1  9  8.81
## 5      1 11  8.33
## 6      1 14  9.96
## 7      1  6  7.24
## 8      1  4  4.26
## 9      1 12 10.84
## 10     1  7  4.82
## 11     1  5  5.68
## 12     2 10  9.14
## 13     2  8  8.14
## 14     2 13  8.74
## 15     2  9  8.77
## 16     2 11  9.26
## 17     2 14  8.10
## 18     2  6  6.13
## 19     2  4  3.10
## 20     2 12  9.13
## 21     2  7  7.26
## 22     2  5  4.74
```

]

```
##    group  x     y
## 23     3 10  7.46
## 24     3  8  6.77
## 25     3 13 12.74
## 26     3  9  7.11
## 27     3 11  7.81
## 28     3 14  8.84
## 29     3  6  6.08
## 30     3  4  5.39
## 31     3 12  8.15
## 32     3  7  6.42
## 33     3  5  5.73
## 34     4  8  6.58
## 35     4  8  5.76
## 36     4  8  7.71
## 37     4  8  8.84
## 38     4  8  8.47
## 39     4  8  7.04
## 40     4  8  5.25
## 41     4 19 12.50
## 42     4  8  5.56
## 43     4  8  7.91
## 44     4  8  6.89
```

]

???

- discover things we don't easily see when we just look at the raw data

---

## Summarizing Anscombe's quartet

```r
ans %>%
  group_by(group) %>%
  summarize(
    n = n(),
    mean_x = mean(x),
    mean_y = mean(y),
    sd_x = sd(x),
    sd_y = sd(y),
    r = cor(x, y)
  )
```

```
## # A tibble: 4 x 7
##   group     n mean_x mean_y  sd_x  sd_y     r
##   <chr> <int>  <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 1        11      9   7.50  3.32  2.03 0.816
## 2 2        11      9   7.50  3.32  2.03 0.816
## 3 3        11      9   7.5   3.32  2.03 0.816
## 4 4        11      9   7.50  3.32  2.03 0.817
```

---

## Visualizing  Anscombe's quartet

---

## Life expectancy vs. GDP

* How would you describe the relationship between life expectancy and GDP per capita in 1952? 
* What other variables could have an influence on the shown trend?
* Which is the country with moderate life expectancy but extremely high GDP?

]

???

- general: the higher the GDP, the higher life expectancy
- however, other factors might explain the variation across the countries: lifestyle, e.g. tobacco and alcohol consumption, lack of exercising, healthcare system
- difficult to see the trend because of the outlier

---

???

- In the mid-twentieth century, Kuwait experienced a period of prosperity called "Golden era" of Kuwait in which the country became the largest oil exporter in the Persian Gulf region by by 1952.
- visualization helps us to understand our data better and to raise new questions

---

# Data visualization

- There are many tools for visualizing data &ndash; D3, Microsoft Excel, Python, `R`, ...
- There are many systems within `R` for creating data visualizations &ndash; base, lattice, ggplot2

---

## `ggplot2` &#x1F4C8;

[`ggplot2`](https://ggplot2.tidyverse.org/index.html) is a package for **data visualization** and part of the tidyverse.

]

- `ggplot2` is inspired by the **Grammar of Graphics**<sup>1</sup>
-  idea: **break the graph into components** and **handle each component individually** &rarr; ensure versatility and control
- a `ggplot2` chart is built by stacking a **series of layers**
- advantage: build a **variety of different charts** with the same vocabulary &rarr; code that is easier to read and write

]

]

???

- basic idea of gg: no matter whether you would like to draw a pie chart, a line chart,
a bar chart or a scatterplot, what you always do is create a **graphic**
- but what is a graphic: a graphic can be decomposed into multiple layers
- instead of having different "super"-functions for every possible chart type
like in base R, the idea of gg is to describe a large variety of different charts
with the same vocabulary
- ggplot is a specific implementation of gg
  - goal: create informative and elegant graphs with relatively simple and readable code
  - part of the tidyverse -> works exclusevly with data frames
  - requires tidy data frames

versatility - Vielseitigkeit, Flexibilität

umfangreich, intuitiv und flexibel

---

## Components of a graphic

.footnote[.font90[Figure: Thomas de Beus. ["Think About the Grammar of Graphics When Improving Your Graphs"](https://medium.com/tdebeus/think-about-the-grammar-of-graphics-when-improving-your-graphs-18e3744d8d18). Medium, 2017.]]

---

## `ggplot2` vocabulary

- **data**: the actual data that is plotted as _tidy_ data frame
- **aesthetics/mapping**: **map variables to visual properties**
  - x- and y-coordinates, color, shapes, transparency, line type
- **geoms** - geometric objects
  - points, bars, lines, histograms, etc.
- **stats** - data transformations (often implicit)
  - counts of categories for bar charts, summary statistics for a boxplot, regression parameters, etc.
- **scales** - translate between variable ranges and visual properties
  - which color should represent which category?, should the y-axis be log-transformed?
- **facets** - spread data onto multiple subplots/panels
- **coordinates** - change and adjust the coordinate system
  - cartesian, polar or cartographic coordinate system
- **themes** - additional visual settings not related to the data
  - font size or background color

]

]

???

- stats: convert raw data into new data which gets plotted
- scales: translate between data values and properties of the plot
- coordinates: physical position of the points, lines, etc. on the paper

---

## First ggplot2 visualization

]
.panel[.panel-name[Code]

- Which data subset is being plotted?
- What does each part of the code do?
- Which variables map to which **aes**thetical features of the plot?

]

```r
ggplot(
  data = filter(gapminder, year == 1952), 
  mapping = aes(x = lifeExp, y = gdpPercap)
) +
  geom_point() +
  labs(
    x = "Life expectancy (years)",
    y = "GDP per capita (USD)",
    title = "Relationship between life expectancy and GDP in 1952"
  )
```

]

---

## First ggplot2 visualization

The first step in creating a `ggplot2` graph is to define a `ggplot` object with the `ggplot()` function.
The main arguments are:

- `data`: the data frame associated with the graph
- `mapping`: the **aes**thetical mapping, i.e., which variables from the data will be mapped
to the x- or y-position, color, shape, transparency, etc.

After initializing the graph, we continuously stack **layers** on top of (like LEGO blocks) with the `+` operator.

For example, we would like to create a graph from the Gapminder data, showing `gdpPercap` and `lifeExp` as scatterplot **geom**etry.

```r
library(tidyverse) # loads also ggplot2
ggplot(
  data = gapminder,
  mapping = aes(x = gdpPercap, y = lifeExp)
) +
  geom_point()
```

]

]

---

???

- which dataset to plot
- which columns to use for x and y
- how to draw the plot
- + to combine ggplot2 elements

---

```r
ggplot(data = gapminder)
```

]

```r
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
```

]

```r
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() # adds a scatterplot layer
```

]

```r
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_smooth(method = "lm") # adds a trend line (lm = linear regression fit)
```

]

```r
# Add both scatterplot layer and trend line layer
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth(method = "lm")
```

]

---

---

background-image: url("figures/02-ggplot2-cheatsheet_1.png")
background-size: contain

???

https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf

---

background-image: url("figures/02-ggplot2-cheatsheet_2.png")
background-size: contain

---

## Further aesthetics

???

- so far, we have made two mappings: one variable represents x-position and one variable represents y-position

---

## Further aesthetics

```r
ggplot(
  gapminder, 
  aes(
    x = gdpPercap, y = lifeExp,
*   color = continent
  )
) + geom_point()
```

]

&nbsp;

By default, `ggplot2` always creates a legend for mapping variables.

]

---

## Global vs. local aesthetics

We can specify the aesthetic mapping either **globally** within the `ggplot()` function or **individually** for a specific layer within a `geom_*()` function.  
If set globally, the aesthetic mapping takes effect on **all** geom layers.

**Global:**

```r
# continent is mapped to color for 
# all underlying layers
ggplot(
  data = gapminder, 
  mapping = aes(
    x = gdpPercap, 
    y = lifeExp, 
*   color = continent
  ))  +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_log10()
```

]

]

---

## Global vs. local aesthetics

**Local:**

```r
# continent is mapped to color only 
# for scatterplot layer
ggplot(
  data = gapminder,
  mapping = aes(x = gdpPercap, y = lifeExp)
) +
* geom_point(mapping = aes(color = continent)) +
  geom_smooth(method = "lm") +
  scale_x_log10()
```

]

]

Note that the legend keys have also changed. Since the color mapping does not
apply on the regression line anymore, the legend keys only show a point instead
of a point and a line.

]

---

## Setting layer arguments

Layer arguments that are **independent from the underlying data frame** are
set outside of `aes()`.

For example, we can make some cosmetic adjustments by setting the points' **color**
and transparency (**alpha**), the line's color and **size**. Further, we remove the
confidence interval (**se**) of the linear regression fit.

```r
ggplot(
  data = gapminder,
  mapping = aes(x = gdpPercap, y = lifeExp)
) +
  geom_point(
    alpha = 0.3, 
    color = "cornflowerblue"
  ) + 
  geom_smooth(
    method = "lm",
    color = "firebrick", 
    se = FALSE, 
    size = 2
  ) +
  scale_x_log10()
```

]

]

---

The following example shows an **incorrect** use of geom arguments.
We would like to show multiple boxplots depicting the distribution of GDP/capita for each continent and
adjust the boxplots' line size to `0.75`.

]

```r
ggplot(gapminder) + geom_boxplot(
  aes(x = "continent", y = gdpPercap,
      size = 0.75)
)
```

&#x1F914; _What went wrong here?_

]

To fix the two problems of the code, we have to adhere to the following
principles:

1. Within `aes()`, variables names must be passed as **expressions**, i.e., **without quotes**.
1. **Non-aesthetic arguments** must be set **outside** of `aes()`.

]

???

- show a box for each continent
- size is larger than 0.75
- do not need a label for size

---

```r
ggplot(gapminder) + geom_boxplot(
  aes(x = "continent", y = gdpPercap,
      size = 0.75)
)
```

]

```r
ggplot(gapminder) + 
  geom_boxplot(
    aes(x = continent, y = gdpPercap), 
    size = 0.75 
  )
```

]

---

## Quiz

Why are the bars of the histogram colored in red although we have specified
a blue color?

```r
ggplot(gapminder) +
  geom_histogram(aes(x = gdpPercap, fill = "steelblue"))
```

---

## Stats

**Stats** are linked to geometries. Every geom has a default stat.

```r
gapminder %>%
  ggplot(aes(x = continent)) +
  geom_bar(stat = "count") # default
```

`stat = "count"` automatically computes the number of observations for each category, which is the variable mapped to the x-aesthetic.

]

```r
gapminder %>% count(continent) %>%
  ggplot(aes(x = continent, y = n)) +
  geom_bar(stat = "identity")
```

`stat = "identity"` requires to specify a variable that is mapped to `y` (bar height).

]

???

You can add `stat_*()` layers to the graph, but this is not required most of the time.

---

## Position adjustment

```r
selected_c <- c("Germany", "France", "Italy", "United States", "Canada")
s07 <- filter(gapminder, year == 2007, country %in% selected_c)
```

```r
ggplot(s07, aes(continent, fill = country)) +
* geom_bar() # default: position_stack()
```

]

```r
ggplot(s07, aes(continent, fill = country)) +
* geom_bar(position = position_stack())
```

```r
ggplot(s07, aes(continent, fill = country)) +
* geom_bar(position = position_dodge())
```

]

```r
ggplot(s07, aes(continent, fill = country)) +
* geom_bar(position = position_fill())
```

---

## Position adjustment

```r
g07 <- filter(gapminder, year == 2007)
```

```r
ggplot(g07, aes(gdpPercap, lifeExp)) +
  
  
  
  geom_point()
```

]

```r
ggplot(g07, aes(gdpPercap, lifeExp)) +
 geom_point(
  position = 
   position_jitter(width = 3000, height = 30)
  ) 
```

]

---

## Scales

- Every aesthetical mapping given by `aes()` will have a scale
- If no **scale layer** is explicitly provided, a default scale will be used 
- Scale function names follow an intuitive scheme: .font120[**`scale_<AES>_<TYPE>()`**]

Examples:

- continuous scale: `scale_<AES>_continuous()`
- discrete scale: `scale_<AES>_discrete()`
- scale with custom values: `scale_<AES>_manual()`
- scale with colors from the RColorBrewer package: `scale_{color,fill}_brewer()`
- scale with a color gradient `scale_{color,fill}_gradient()`
- ...

Except for x-/y-axis-scales, every scale will have its own **legend**.

]

---

## Axis scales

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point()
```

The x- and y-axis scales default to `scale_x_continuous()` and
`scale_y_continuous()`, respectively. 
We do not need to explicitly add these layers to the graph.

]

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
* scale_x_continuous()
```

]

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  scale_x_continuous(limits = c(200, 50000))
```

```
## Warning: Removed 6 rows containing missing values (geom_point).
```

Note that we receive a warning. 
There are 6 observations that are outside the specified x-axis range.  
Since the graph reveals a log relationship between GDP per capita and life expectancy,
we may improve it by log-transforming the x-axis.

Tip: In RStudio, write `scale_x_` and press **Tab ↹** or **Ctrl + SPACE ␣"**
to get autocomplete suggestions
of available x-axis scale transformation functions.

]

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
* scale_x_log10()
```

The x-axis labels in scientific location don't look particularly pretty.  
We would like to make the following changes:

- make the x-axis labels more intuitive
- set custom axis breaks at 500, 5000 and 50000

]

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  scale_x_log10(
    breaks = c(500, 5000, 50000), # ticks pos.
    labels = scales::comma # alternative labeling function
    # (put a comma before every three digits)
  )
```

]

---

## Further examples of scale adjustments

```r
ge7 <- gapminder %>% filter(year == 2002, continent == "Europe")
```

```r
ggplot(ge7, aes(gdpPercap, lifeExp)) +
  geom_point()
```

]
.panel[.panel-name[Custom breaks]

```r
ggplot(ge7, aes(gdpPercap, lifeExp)) +
  geom_point() +
  scale_x_continuous(
    breaks = seq(8000, 40000, 8000)
  ) 
```

]

```r
ggplot(ge7, aes(gdpPercap, lifeExp)) +
  geom_point() +
  scale_y_continuous(limits = c(65, 85))
```

]

```r
ggplot(ge7, aes(gdpPercap, lifeExp)) +
  geom_point() +
  scale_y_continuous(breaks = c(72, 80), labels = c("72", "80 yrs"))
```

]

---

## Color scales

```r
ggplot(gapminder, aes(x = continent, fill = continent)) +
  geom_bar() # + scale_fill_discrete()
```

---

## Color scales

We can replace this default color scale by adding a different `scale_fill_*`
layer.

```r
ggplot(gapminder, aes(x = continent, fill = continent)) +
  geom_bar() +
* scale_fill_brewer(palette = "Dark2")
```

]

```r
RColorBrewer::display.brewer.all()
```

]

[colorbrewer2.org](http://colorbrewer2.org/)

]

---

```r
colorspace::hcl_palettes(plot = TRUE)
```

&rarr; .font140[`colorspace::scale\_<AES>\_<TYPE>\_<COLORSCALE>(palette = <PALETTE-NAME>)`]

---

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = lifeExp)) +
  geom_point() +
* colorspace::scale_color_continuous_sequential("Magenta")
```

The vector type of the variable that is mapped to the color aesthetic determines whether a color gradient is created (in case of a numeric variable) or whether a disrete color mapping is created (in case of a factor variable).

]

---

```r
scales::show_col(colors()) # colors() returns the built-in color names
```

]

]

???

R understands 657 color names.

---

## Facetting

One of the highlights of `ggplot2` is the possibility to easily **facet** a plot,
i.e. splitting the data onto multiple panels. Facetting allows to compactly present a lot of information by **stratifying by a third variable**. Also, faceting often is a remedy against **overplotting**.

The `facet_wrap()` function creates subpanels. Notation: `~`(tilde) comma-separated names of variables

```r
ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap)) +
geom_line(aes(group = country)) +
  scale_y_log10() +
* facet_wrap(~ continent)
```

???

Wenn wir die Beziehung zwischen 2 Variablen darstellen, kann es sein, dass eine andere Variable diese Beziehung verschleiert (Confounding). Zum Beispiel könnte es sein, dass der Kontinent einen Einfluss auf die Entwicklung des BIPs hat.

Afrika: Absolutes wirtschaftliches Wachstum ist geringer als in Europa

---

## Facetting

```r
p <- ggplot(data = gapminder %>% filter(continent != "Oceania"), aes(x = year,y = gdpPercap))
p + geom_line(aes(group = country)) +
  geom_smooth(method = "loess", se = FALSE) +
  scale_x_continuous(breaks = seq(1960, 2000, 20)) + scale_y_log10(labels = scales::dollar) +
* facet_wrap(~ continent, nrow = 1) +
  labs(x = NULL, y = "GDP per capita")
```

.font80[.content-box-green[Instead of the formula notation (`~`), you can alternatively
specify faceting variables with `vars()`, e.g. `facet_wrap(vars(continent))`]]

???

- loess: nicht-lineare, nicht-parametrische Regression
- facetting variables are specified in formula notation http://r4ds.had.co.nz/model-basics.html

---

## `facet_wrap()` vs. `facet_grid()`

- `facet_wrap()`: sequence of panels
- `facet_grid()`: matrix of panels

Show GDP development, stratified by life expectancy groups:

```r
my_gapminder <- gapminder %>%  filter(continent %in% c("Africa", "Asia", "Europe")) %>%
  group_by(country) %>% mutate(lifeExp = if_else(max(lifeExp)<75, "lifeExp < 75", "lifeExp >= 75"))
p <- ggplot(data = my_gapminder, mapping = aes(x = year, y = gdpPercap)) +
  geom_line(aes(group = country)) +
  scale_x_continuous(breaks = seq(1960, 2000, 20)) +
  scale_y_log10(labels = scales::dollar)
```

```r
*p + facet_wrap(~ continent + lifeExp)
```

]

```r
*p + facet_grid(continent ~ lifeExp)
```

]

---

## Labels

```r
p +
  labs(
    x = "GDP per capita",
    y = "Life expectancy",
    color = "Continent",
    title = "Relationship between GDP per capita\nand life expectancy",
    subtitle = "<subtitle>",
    caption = "<caption>",
    tag = "A"
  )
```

---

## Coordinates

```r
# Calculate average relative population growth
# from 1952 to 2007 per continent
(gc <- gapminder %>%
  filter(year %in% c(1952, 2007)) %>%
  group_by(continent, year) %>%
  summarize(avg_pop = mean(pop)) %>%
  group_by(continent) %>%
  summarize(rel_pop_growth =
              (avg_pop[2]-avg_pop[1]) /
              avg_pop[1]))
```

```
## # A tibble: 5 x 2
##   continent rel_pop_growth
##   <fct>              <dbl>
## 1 Africa             2.91 
## 2 Americas           1.60 
## 3 Asia               1.73 
## 4 Europe             0.402
## 5 Oceania            1.30
```

]

```r
ggplot(gc, aes(x=continent, y=rel_pop_growth)) +
  geom_col() +
* coord_polar() +
  scale_y_continuous(labels = scales::percent)
```

]

---

## Coordinates

For **zooming**, use `coord_*` layers instead of `scale_*` layers.

```r
ggplot(gc, aes(x=continent, y=rel_pop_growth)) +
  geom_col() +
* scale_y_continuous(limits = c(0, 1.7))
```

```
## Warning: Removed 2 rows containing missing values
## (position_stack).
```

]

```r
ggplot(gc, aes(x=continent, y=rel_pop_growth)) +
  geom_col() +
* coord_cartesian(ylim = c(0, 1.7))
```

]

---

`coord_flip()` flips cartesian coordinates. It is very useful when you have a
lot of categories on the x-axis or want to display
very long labels.

```r
gf <- filter(gapminder, year == 2007, continent == "Americas")
```

```r
ggplot(gf, aes(country, pop)) +
  geom_col()
```

]

```r
ggplot(gf, aes(country, pop)) + geom_col() +
* coord_flip()
```

]

By swapping horizontal and vertical axes, the country names become readable.

---

## Themes

Use a theme layer to change style aspects of the plot that are not related to
the data.

Apply a build-in theme with `theme_<NAME>` to quickly change the overall appearance:

```r
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point()
p + theme_gray() # default theme
```

---

## Alternative themes

> The signature ggplot2 theme with a grey background and white gridlines, designed to put the data forward yet make comparisons easy. &mdash; `?theme_grey`

```r
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point()
p + theme_gray() # default theme
```

]

> The classic dark-on-light ggplot2 theme. May work better for presentations displayed with a projector. &mdash; `?theme_bw`

```r
p + theme_bw()
```

]

> A minimalistic theme with no background annotations. &mdash; `?theme_minimal`

```r
p + theme_minimal()
```

]

> A completely empty theme. &mdash; `?theme_void`

```r
p + theme_void()
```

]

---

## `ggthemes`

<https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/>

]

> Theme similar to the default settings of the 'base' R graphics. &mdash; `?theme_base`

```r
p + ggthemes::theme_base()
```

]
.panel[.panel-name[`theme_excel_new()`]

> Theme for ggplot2 that is similar to the default style of charts in current versions of Microsoft Excel. &mdash; `?theme_excel_new`

```r
p + ggthemes::theme_excel_new()
```

]

---

## Modifying base theme properties

```r
p + theme_minimal()
```

]

```r
p + theme_minimal(base_size = 24, base_family = "serif", base_line_size = 4)
```

]

Every theme has four fundamental properties:

- `base_size` = 11 (in pt)
- `base_family` = "sans" (sans serif font)
- `base_line_size = base_size/22` (width of a line in pt)
- `base_rect_size = base_size/22` (line width of borders and backgrounds)

]

---

## Modify indiv. theme elements: `theme(<ELEMENT> = ...)`

Make axes titles red and right-aligned.

```r
p + theme_minimal(base_size = 24) +
* theme(axis.title = element_text(color = "red", hjust = 1))
```

]

Add a y-axis line with arrow.

```r
p + theme_minimal(base_size = 24) +
* theme(axis.line.y = element_line(arrow = arrow(type = "closed")))
```

]

Make the legend box yellow and its border blue.

```r
p + theme_minimal(base_size = 24) +
* theme(legend.background = element_rect(fill = "yellow", color = "blue"))
```

]

Remove all grid lines.

```r
p + theme_minimal(base_size = 24) +
* theme(panel.grid = element_blank())
```

]

Put the legend above the plot.

```r
p + theme_minimal(base_size = 24) +
* theme(legend.position = "top")
```

]

`?theme` is your friend. &#x1F60E;

]

---

## Save plots &#x1F4BE;

```r
ggsave(
  filename = "filename.png", # or: pdf, svg, jpeg, eps, tiff, ...
  plot = p, # if not specified saves plot that was created last
  width = 8, height = 6,
  units = "cm", 
  dpi = 300 # specifies resolution (dots per inch) 
)
```

---

# Visualizing numerical data

---

## Number of variables involved

- **Univariate** data analysis: distribution of single variable
- **Bivariate** data analysis: relationship between two variables
- **Multivariate** data analysis: relationship between many variables at once, usually focusing on the relationship between two while conditioning for others

---

## Types of variables

In this course, we use the terms _variable_, _attribute_, and _feature_ synonymously.

]

---

## IBM HR employee attrition & performance dataset

Artificial dataset from the [IBM Watson Analytics Lab](https://www.ibm.com/communities/analytics/watson-analytics-blog/hr-employee-attrition/) about factors that lead to employee attrition.

```r
data(attrition, package = "modeldata")
attrition <- as_tibble(attrition)
glimpse(attrition)
```

```
## Rows: 1,470
## Columns: 31
## $ Age                      <int> 41, 49, 37, 33, 27, 32, 59, 30, 38, 36, 35, 29, 31, 34,~
## $ Attrition                <fct> Yes, No, Yes, No, No, No, No, No, No, No, No, No, No, N~
## $ BusinessTravel           <fct> Travel_Rarely, Travel_Frequently, Travel_Rarely, Travel~
## $ DailyRate                <int> 1102, 279, 1373, 1392, 591, 1005, 1324, 1358, 216, 1299~
## $ Department               <fct> Sales, Research_Development, Research_Development, Rese~
## $ DistanceFromHome         <int> 1, 8, 2, 3, 2, 2, 3, 24, 23, 27, 16, 15, 26, 19, 24, 21~
## $ Education                <ord> College, Below_College, College, Master, Below_College,~
## $ EducationField           <fct> Life_Sciences, Life_Sciences, Other, Life_Sciences, Med~
## $ EnvironmentSatisfaction  <ord> Medium, High, Very_High, Very_High, Low, Very_High, Hig~
## $ Gender                   <fct> Female, Male, Male, Female, Male, Male, Female, Male, M~
## $ HourlyRate               <int> 94, 61, 92, 56, 40, 79, 81, 67, 44, 94, 84, 49, 31, 93,~
## $ JobInvolvement           <ord> High, Medium, Medium, High, High, High, Very_High, High~
## $ JobLevel                 <int> 2, 2, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 1, 1, 3, 1, 1, 4~
## $ JobRole                  <fct> Sales_Executive, Research_Scientist, Laboratory_Technic~
## $ JobSatisfaction          <ord> Very_High, Medium, High, High, Medium, Very_High, Low, ~
## $ MaritalStatus            <fct> Single, Married, Single, Married, Married, Single, Marr~
## $ MonthlyIncome            <int> 5993, 5130, 2090, 2909, 3468, 3068, 2670, 2693, 9526, 5~
## $ MonthlyRate              <int> 19479, 24907, 2396, 23159, 16632, 11864, 9964, 13335, 8~
## $ NumCompaniesWorked       <int> 8, 1, 6, 1, 9, 0, 4, 1, 0, 6, 0, 0, 1, 0, 5, 1, 0, 1, 2~
## $ OverTime                 <fct> Yes, No, Yes, Yes, No, No, Yes, No, No, No, No, Yes, No~
## $ PercentSalaryHike        <int> 11, 23, 15, 11, 12, 13, 20, 22, 21, 13, 13, 12, 17, 11,~
## $ PerformanceRating        <ord> Excellent, Outstanding, Excellent, Excellent, Excellent~
## $ RelationshipSatisfaction <ord> Low, Very_High, Medium, High, Very_High, High, Low, Med~
## $ StockOptionLevel         <int> 0, 1, 0, 0, 1, 0, 3, 1, 0, 2, 1, 0, 1, 1, 0, 1, 2, 2, 0~
## $ TotalWorkingYears        <int> 8, 10, 7, 8, 6, 8, 12, 1, 10, 17, 6, 10, 5, 3, 6, 10, 7~
## $ TrainingTimesLastYear    <int> 0, 3, 3, 3, 3, 2, 3, 2, 2, 3, 5, 3, 1, 2, 4, 1, 5, 2, 3~
## $ WorkLifeBalance          <ord> Bad, Better, Better, Better, Better, Good, Good, Better~
## $ YearsAtCompany           <int> 6, 10, 0, 8, 2, 7, 1, 1, 9, 7, 5, 9, 5, 2, 4, 10, 6, 1,~
## $ YearsInCurrentRole       <int> 4, 7, 0, 7, 2, 7, 0, 0, 7, 7, 4, 5, 2, 2, 2, 9, 2, 0, 8~
## $ YearsSinceLastPromotion  <int> 0, 1, 0, 3, 2, 3, 0, 0, 1, 7, 0, 0, 4, 1, 0, 8, 0, 0, 3~
## $ YearsWithCurrManager     <int> 5, 7, 0, 0, 2, 6, 0, 0, 8, 7, 3, 8, 3, 2, 3, 8, 5, 0, 7~
```

---

## Variable types...

...for a selection of columns:

```r
attrition %>%
  select(Age, Attrition, Gender, BusinessTravel, EducationField, JobLevel) %>%
  glimpse()
```

```
## Rows: 1,470
## Columns: 6
## $ Age            <int> 41, 49, 37, 33, 27, 32, 59, 30, 38, 36, 35, 29, 31, 34, 28, 29, 3~
## $ Attrition      <fct> Yes, No, Yes, No, No, No, No, No, No, No, No, No, No, No, Yes, No~
## $ Gender         <fct> Female, Male, Male, Female, Male, Male, Female, Male, Male, Male,~
## $ BusinessTravel <fct> Travel_Rarely, Travel_Frequently, Travel_Rarely, Travel_Frequentl~
## $ EducationField <fct> Life_Sciences, Life_Sciences, Other, Life_Sciences, Medical, Life~
## $ JobLevel       <int> 2, 2, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 1, 1, 3, 1, 1, 4, 1, 2, 1,~
```

---

## Describing shapes of numerical distributions

- **shape**:
  * **skewness**: left-skewed, right-skewed, symmetric
  * **modality**: unimodal, bimodal, multimodal, uniform
- **center**: mean (`mean()`), median (`median()`), mode (useful rather for categorical data)
- **spread**: range (`range()`), standard deviation (`sd()`), inter-quartile range (`IQR()`)
- unusual observations, i.e.,  **outliers**

---

## Histogram

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_histogram()
```

```
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

---

## Binwidth of histograms

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_histogram(binwidth = 100)
```

]

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_histogram(binwidth = 1000)
```

]

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_histogram(binwidth = 5000)
```

]

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_histogram(bins = 15)
```

]

---

## Customizing histograms

]

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_histogram(binwidth = 1000) +
  labs(
    x = "Monthly income (USD)",
    y = "Frequency",
    title = "Frequency of employee income"
  )
```

]

---

## Fill with a categorical variable

]

```r
ggplot(
  attrition, 
  aes(
    x = MonthlyIncome, 
*   fill = Department
  )
) + 
  geom_histogram(
    binwidth = 1000, 
*   alpha = 0.7
  ) +
  labs(
    x = "Monthly income (USD)",
    y = "Frequency",
    title = "Frequency of employee income"
  )
```

]

---

## Facet with a categorical variable

]

```r
ggplot(
  attrition, 
  aes(
    x = MonthlyIncome
  )
) + 
  geom_histogram(
    binwidth = 1000
  ) +
  labs(
    x = "Monthly income (USD)",
    y = "Frequency",
    title = "Frequency of employee income"
  ) +
* facet_wrap(~ Department, nrow = 3)
```

]

---

## Density plot

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_density()
```

]

A density curve is like a smoothed representation of a histogram.

]

---

## Adjusting the bandwith to control smoothness

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_density(adjust = 0.2)
```

]

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_density(adjust = 1)
```

]

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_density(adjust = 2)
```

]

---

## Boxplot

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_boxplot()
```

The text on the y-axis isn't informative at all. Let's remove it.

---

## Customizing boxplots

]

```r
ggplot(attrition, aes(x = MonthlyIncome)) + 
  geom_boxplot() +
  labs(
    x = "Monthly income (USD)",
*   y = NULL,
    title = "Employee income"
  ) +
* theme(axis.text.y = element_blank()) +
* theme(axis.ticks.y = element_blank())
```

]

---

## Adding a categorical variable

]

```r
ggplot(attrition, aes(
  x = MonthlyIncome, 
* y = Education
)
) + 
  geom_boxplot() +
  labs(
    x = "Monthly income (USD)",
    y = "Education",
    title = "Employee income",
    subtitle = "By education level"
  )
```

]

---

## Violin plots

]

```r
ggplot(attrition, aes(
  x = MonthlyIncome, 
  y = Education 
)
) + 
* geom_violin() +
  labs(
    x = "Monthly income (USD)",
    y = "Education",
    title = "Employee income",
    subtitle = "By education level"
  )
```

]

---

## Ridgeline plots

```
## Picking joint bandwidth of 1240
```

]

```r
ggplot(attrition, aes(
  x = MonthlyIncome, 
  y = Education 
)
) + 
* ggridges::geom_density_ridges() +
  labs(
    x = "Monthly income (USD)",
    y = "Education",
    title = "Employee income",
    subtitle = "By education level"
  )
```

]

---

## Scatterplot

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point()
```

There are a lot of overlapping points, which makes understanding of data density difficult.

---

## Hex plot

```r
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
  geom_hex()
```

---

## Hex plot

```r
ggplot(gapminder %>% filter(gdpPercap < 50000), 
       aes(x = gdpPercap, y = lifeExp)) + 
  geom_hex()
```

---

# Visualizing categorical data

---

## Bar plot

```r
ggplot(attrition, aes(x = Education)) +
  geom_bar()
```

---

## Segmented bar plot (absolute frequencies)

```r
ggplot(attrition, aes(x = Education,
*                     fill = JobSatisfaction)) +
  geom_bar()
```

---

## Segmented bar plot (relative frequencies)

```r
ggplot(attrition, aes(x = Education, fill = JobSatisfaction)) + 
  geom_bar(position = "fill")
```

---

Which of the two bar plot variants is a more effective visualization for representing the relationship between education and job satisfaction?

]

]

]

---

## Customizing bar plots

]

```r
ggplot(attrition, aes(y = Education, fill = JobSatisfaction)) + 
  scale_x_continuous(labels = scales::percent) +
  geom_bar(position = "fill") +
  labs(
    x = "Proportion",
    y = "Education",
    title = "Relationship between education level and job satisfaction"
  ) 
```

]

---

## Advanced ggplot2 & extensions

---

## Extracting plot details

One of the main reasons why `ggplot2` is easy to use, is that it makes the required computations for a lot of geoms by itself.
For example, the boxplot geom automatically calculates the values for the 5-point summary and identifies possible outliers.

```r
p <- ggplot(iris, aes(Species, Sepal.Length)) +
   geom_boxplot()
p
```

]

```r
class(p)
```

```
## [1] "gg"     "ggplot"
```

&nbsp;

.content-box-yellow[&#x1F914; _"I need to depict the summary statistics of the boxplot for
my final project report. How can I extract them?"_]

]

???

- ggplot automatically calculates the 5-point summary for a boxplot
  - median, 1. and 3. quartile, whisker lengths and outliers

---

## Extracting plot details

Whenever a ggplot object is "printed" to the screen,
the function `ggplot_build()` is invoked internally to render the plot.

```r
gr <- ggplot_build(p) # execute all necessary steps to render the plot
class(gr)
```

```
## [1] "ggplot_built"
```

```r
names(gr)
```

```
## [1] "data"   "layout" "plot"
```

The three components of a `ggplot_built` object are:

- `data`: details for each plot layer, e.g. the 5-point summary of a boxplot.
- `layout`: axis information, e.g. breaks, ranges and labels
- `plot`: the rendered plot itself

---

## Extracting plot details

We are interested in the 5-point summary of the 3 boxplots,
which were automatically calculated by `ggplot2`.
We extract the information from the `data` element of `gr`.

```r
gr$data[[1]] # or: layer_data(p, i = 1)
```

```
##   ymin lower middle upper ymax outliers notchupper notchlower x flipped_aes PANEL group
## 1  4.3 4.800    5.0   5.2  5.8            5.089378   4.910622 1       FALSE     1     1
## 2  4.9 5.600    5.9   6.3  7.0            6.056412   5.743588 2       FALSE     1     2
## 3  5.6 6.225    6.5   6.9  7.9      4.9   6.650826   6.349174 3       FALSE     1     3
##   ymin_final ymax_final  xmin  xmax xid newx new_width weight colour  fill size alpha
## 1        4.3        5.8 0.625 1.375   1    1      0.75      1 grey20 white  0.5    NA
## 2        4.9        7.0 1.625 2.375   2    2      0.75      1 grey20 white  0.5    NA
## 3        4.9        7.9 2.625 3.375   3    3      0.75      1 grey20 white  0.5    NA
##   shape linetype
## 1    19    solid
## 2    19    solid
## 3    19    solid
```

Here, each row contains data for one of the three boxes. The first five columns are:

- `ymin`: lower end of lower whisker (= median - 1.5 * IQR)
- `lower`: lower end of box (= first quartile)
- `middle`: horizontal line within box (= median)
- `upper`: upper end of box (= third quartile) 
- `ymax`: upper end of upper whisker (= median + 1.5 * IQR)

---

## Maps &#x1F5FA;

Example **choropleth maps** showing the poll results of the 2016 United States Presidential Elections:

.footnote[Figure source: Kieran Healy. ["Data Visualization. A practical introduction"](http://socviz.co/).
Princeton University Press, 2018.]

---

## Maps &#x1F5FA;

Draw a map of the USA:

```r
usa <- map_data("state")
str(usa)
```

```
## 'data.frame':	15537 obs. of  6 variables:
##  $ long     : num  -87.5 -87.5 -87.5 -87.5 -87.6 ...
##  $ lat      : num  30.4 30.4 30.4 30.3 30.3 ...
##  $ group    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ order    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ region   : chr  "alabama" "alabama" "alabama" "alabama" ...
##  $ subregion: chr  NA NA NA NA ...
```

```r
ggplot(usa, aes(x = long, y = lat,
                group = group)) +
* geom_polygon(color = "black", fill = NA) +
  coord_map() +
  theme_void()
```

]

]

---

## Interactive graphs with ggiraph

An interactive map that shows the 2016 US presidential election results.
Hovering over a state lets a tooltip pop up, showing the percentages of
each candidate of the Democratic and Republican parties.

]

```r
library(ggiraph)
p <- usa %>% rename(state = region) %>%
  mutate(state = stringr::str_to_title(state)) %>%
  mutate(state = if_else(state == "District Of Columbia", "District of Columbia", state)) %>%
  left_join(socviz::election %>% select(state, winner, pct_clinton, pct_trump), by = "state") %>%
  mutate(tooltip = paste0(winner, " won ", state, "\nClinton: ", pct_clinton, "%\nTrump: ", pct_trump, "%")) %>%
  ggplot(aes(long, lat, group = group)) +
  geom_polygon_interactive(aes(fill = winner, data_id = state, tooltip = tooltip), color = "gray90") +
  scale_fill_manual(values = c("royalblue3", "firebrick2")) +
  labs(fill = "Winning\ncandidate") +
  coord_map() +
  theme_void(base_family = "Fira Sans", base_size = 18)
girafe(ggobj = p)
```

]

???

maybe useful: https://github.com/davidgohel/budapestbi2017/blob/master/docs/ggiraph/slides.Rmd

---

## Draw maps from shape files

The `maps` package contains map data only for a handful of countries,
including USA, France, Italy and New Zealand, as well as 2 world maps.

Generally, **shapefiles** are more flexible in accessing geographic and political boundaries
than built-in maps. A shapefile is **geospatial vector data format** for geographic
information system (GIS) software.

Shapefiles actually consist of several sub-files, see e.g. [Wikipedia](https://en.wikipedia.org/wiki/Shapefile).

---

# Extensions & Alternatives

---

## [https://exts.ggplot2.tidyverse.org/gallery](https://exts.ggplot2.tidyverse.org/gallery)

---

## ggiraph

[`ggiraph`](https://davidgohel.github.io/ggiraph/): htmlwidget to extend `ggplot2` with `D3.js`
to generate **animated** graphs

]

]

---

## plotly

[`plotly`](https://plot.ly/r): interface to eponymous Javascript library to
create interactive graphs.
The comfort function `ggplotly()` converts a `ggplot2` plot into a `plotly`
graph

]

]

---

## gganimate

[`gganimate`](https://github.com/thomasp85/gganimate): animated `ggplot2` plots

![](figures//02-gapminder.gif)

---

## ggforce

[`ggforce`](https://ggforce.data-imaginist.com/): various additional extensions to `ggplot2`

```r
library(ggforce)
ggplot(iris, aes(Sepal.Length, Petal.Width)) +
  coord_cartesian(xlim = c(3.5,8.5), ylim = c(-0.25,2.75),
                  expand = F) +
  geom_point() +
  geom_mark_hull(aes(fill = Species, label = Species),
                 concavity = 3)
```

]

]

---

## patchwork

[`patchwork`](https://ggforce.data-imaginist.com/): combine multiple different ggplot2 graphs into a composite plot

]

]

---

## ggraph

[`ggraph`](https://cran.r-project.org/web/packages/ggraph/index.html): graph and network visualizations<sup>1</sup>

.pull-left[
<img src="figures//02-ggraph_1.gif" width="300px" /><img src="figures//02-ggraph_2.png" width="300px" />
]

.pull-right[
<img src="figures//02-ggraph_3.png" width="300px" /><img src="figures//02-ggraph_4.png" width="300px" />

]

---

## ggalluvial

`ggalluvial`: graphs for multi-dimensional categorical count data or repeated
categorical measurement data<sup>1</sup> (Sankey diagrams)

???

---

## Interactive graphs: [gallery.htmlwidgets.org](http://gallery.htmlwidgets.org/)

---

## r2d3

[`r2d3`](https://rstudio.github.io/r2d3/): `R` interface to [D3](https://d3js.org/) Javascript library.

]

]

---

background-image: url("figures/02-ggplot2-abstraction-level.png")
background-size: contain

.footnote[[Kieran Healy. A Practical Introduction to Data Visualization with ggplot2 Workshop. rstudio::conf 2020.](https://github.com/rstudio-conf-2020/dataviz)]

---

## Further materials

- **Hadley Wickham. ["ggplot2 - Elegant Graphics for Data Analysis"](https://link.springer.com/book/10.1007%2F978-0-387-98141-3). Springer, 2016.**
- Hadley Wickham, and Garrett Grolemund. ["R for Data Science"](http://r4ds.had.co.nz/). O'Reilly, 2017. Chapters:
  - [Data Visualization](http://r4ds.had.co.nz/data-visualisation.html)
  - [Graphics for communication](http://r4ds.had.co.nz/graphics-for-communication.html)
- **Claus O. Wilke. ["Fundamentals of Data Visualization"](http://serialmentor.com/dataviz/). O'Reilly Media, 2018.**
- **Kieran Healy. ["Data Visualization. A practical introduction"](http://socviz.co/). Princeton University Press, 2018.**
- RStudio's [`ggplot2` cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf)
- ["awesome" ggplot2](https://github.com/erikgahner/awesome-ggplot2): curated list of various ggplot2 resources

]

]

???

awesome ggplot2:

- new geoms: ridgeline plots, wordclouds
- additional themes
- books and online courses
- tutorials

---

## Session info

```
##  setting  value                       
##  version  R version 4.0.4 (2021-02-15)
##  os       Windows 10 x64              
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  ctype    English_United States.1252  
##  tz       Europe/Berlin               
##  date     2021-03-31
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> package </th>
   <th style="text-align:left;"> version </th>
   <th style="text-align:left;"> date </th>
   <th style="text-align:left;"> source </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> dplyr </td>
   <td style="text-align:left;"> 1.0.5 </td>
   <td style="text-align:left;"> 2021-03-05 </td>
   <td style="text-align:left;"> CRAN (R 4.0.4) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> forcats </td>
   <td style="text-align:left;"> 0.5.1 </td>
   <td style="text-align:left;"> 2021-01-27 </td>
   <td style="text-align:left;"> CRAN (R 4.0.3) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gapminder </td>
   <td style="text-align:left;"> 0.3.0 </td>
   <td style="text-align:left;"> 2017-10-31 </td>
   <td style="text-align:left;"> CRAN (R 4.0.3) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ggforce </td>
   <td style="text-align:left;"> 0.3.2 </td>
   <td style="text-align:left;"> 2020-06-23 </td>
   <td style="text-align:left;"> CRAN (R 4.0.2) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ggplot2 </td>
   <td style="text-align:left;"> 3.3.3 </td>
   <td style="text-align:left;"> 2020-12-30 </td>
   <td style="text-align:left;"> CRAN (R 4.0.3) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> patchwork </td>
   <td style="text-align:left;"> 1.1.1 </td>
   <td style="text-align:left;"> 2020-12-17 </td>
   <td style="text-align:left;"> CRAN (R 4.0.3) </td>
  </tr>
</tbody>
</table>

]

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> package </th>
   <th style="text-align:left;"> version </th>
   <th style="text-align:left;"> date </th>
   <th style="text-align:left;"> source </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> purrr </td>
   <td style="text-align:left;"> 0.3.4 </td>
   <td style="text-align:left;"> 2020-04-17 </td>
   <td style="text-align:left;"> CRAN (R 4.0.2) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> readr </td>
   <td style="text-align:left;"> 1.4.0 </td>
   <td style="text-align:left;"> 2020-10-05 </td>
   <td style="text-align:left;"> CRAN (R 4.0.3) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> stringr </td>
   <td style="text-align:left;"> 1.4.0 </td>
   <td style="text-align:left;"> 2019-02-10 </td>
   <td style="text-align:left;"> CRAN (R 4.0.2) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> tibble </td>
   <td style="text-align:left;"> 3.1.0 </td>
   <td style="text-align:left;"> 2021-02-25 </td>
   <td style="text-align:left;"> CRAN (R 4.0.3) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> tidyr </td>
   <td style="text-align:left;"> 1.1.3 </td>
   <td style="text-align:left;"> 2021-03-03 </td>
   <td style="text-align:left;"> CRAN (R 4.0.4) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> tidyverse </td>
   <td style="text-align:left;"> 1.3.0 </td>
   <td style="text-align:left;"> 2019-11-21 </td>
   <td style="text-align:left;"> CRAN (R 4.0.2) </td>
  </tr>
</tbody>
</table>

]

</div>

---

# Thank you! Questions?

&nbsp;

-->