library(tidyverse)
theme_set(theme_bw(base_size = 12))
set.seed(1)
This website — https://ggplot2.tidyverse.org/reference/ — is incredibly useful. We’ll focus on the most basic building blocks. Let’s build a couple of data frames to use:
scores <-
tibble(
name = c("mike", "carol", "greg", "marcia", "peter", "jan", "bobby", "cindy", "alice"),
school = c("south", "south", "south", "south", "north", "north", "north", "south", "south"),
teacher = c("johnson", "johnson", "johnson", "johnson", "smith", "smith", "smith", "perry", "perry"),
math_score = c(4, 3, 2, 4, 3, 4, 5, 4, 5),
reading_score = c(1, 5, 2, 4, 5, 4, 1, 5, 4)
)
random_numbers <-
tibble(
binom = rbinom(1000, 10, 0.5),
pois = rpois(1000, 5)
) %>%
mutate(
mix = rnorm(1000, binom + pois, 5),
high_low = ifelse(mix > 10, "high", "low")
)
To get the basics down, I’ll show an example using the scores
dataset and then ask you to make a similar plot using the random_numbers
dataset.
If your variable is continuous, you’ll want to use geom_histogram
:
scores %>%
ggplot(aes(x = math_score)) +
geom_histogram(binwidth = 2)
?geom_histogram
Make a histogram with the random numbers and play around with geom_histogram
options:
random_numbers %>%
ggplot(aes(x = binom)) +
geom_histogram(bins = 20)
If your variable is discrete, you’ll want to use geom_bar
:
scores %>%
ggplot(aes(x = teacher)) +
geom_bar()
Your turn with geom_bar
:
random_numbers %>%
ggplot(aes(x = high_low)) +
geom_bar() +
coord_flip()
It’s most common to use a scatter plot with geom_point
:
scores %>%
ggplot(aes(x = math_score, y = reading_score)) +
geom_point()
You don’t have to use continuous variables!
scores %>%
ggplot(aes(x = math_score, y = name)) +
geom_point()
Your turn:
Making a column for each “point” with geom_col
can also be useful:
scores %>%
ggplot(aes(x = name, y = reading_score)) +
geom_col() +
geom_point(color = "red")
Your turn:
geom_boxplot
is useful for plotting a discrete continuous variable:
scores %>%
ggplot(aes(x = school, y = math_score)) +
geom_boxplot()
Your turn:
geom_smooth
is a fancy geom but still just uses two variables:
scores %>% ggplot(aes(x = reading_score, y = math_score)) + geom_smooth(method = "lm", se = FALSE) + geom_point()
## `geom_smooth()` using formula 'y ~ x'
You can have multiple geoms — in this case it’s common to point points and a smoother both on a plot.
Your turn:
There are a few ways:
# color
scores %>%
ggplot(aes(x = reading_score, y = math_score, color = school)) +
geom_point()
# shape
scores %>%
ggplot(aes(x = reading_score, y = math_score, shape = school)) +
geom_point()
# with a facet
scores %>%
ggplot(aes(x = reading_score, y = math_score)) +
geom_point() +
facet_wrap(~ school)
Your turn:
Here’s a crazy plot that uses many, many options that might be useful for reference:
scores %>%
ggplot(mapping =
aes(x = math_score, y = reading_score, color = school)) +
geom_point(alpha = 0.4) +
scale_x_continuous(limits = c(0, 10)) +
scale_y_log10() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ school, ncol = 1) +
labs(
x = "Math",
y = "Reading",
title = "Math and reading test scores by student",
subtitle = "Just for 8 students"
) +
theme_gray(base_size = 18)
## `geom_smooth()` using formula 'y ~ x'
# this is useful for a session: theme_set(theme_bw(base_size = 12))
Your turn to make a crazy plot:
# go for it