As always, the first thing we will do is load the tidyverse.
Note: If you haven’t yet installed the tidyverse, you’ll first have to run the code install.packages(“tidyverse”).
library(tidyverse)
Here’s the dataframe that we’ll analyze in this exercise:
scores <-
data_frame(
name = c("mike", "carol", "greg", "marcia", "peter", "jan", "bobby", "cindy", "alice"),
school = c("south", "south", "south", "south", "north", "north", "north", "south", "south"),
teacher = c("johnson", "johnson", "johnson", "johnson", "smith", "smith", "smith", "perry", "perry"),
gender = c("male", "female", "male", "female", "male", "female", "male", "female", "female"),
math_score = c(4, 3, 2, 4, 3, 4, 5, 4, 5),
reading_score = c(1, 5, 2, 4, 5, 4, 1, 5, 4)
)
Let’s first take a look at it:
scores
## # A tibble: 9 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 carol south johnson female 3 5
## 3 greg south johnson male 2 2
## 4 marcia south johnson female 4 4
## 5 peter north smith male 3 5
## 6 jan north smith female 4 4
## 7 bobby north smith male 5 1
## 8 cindy south perry female 4 5
## 9 alice south perry female 5 4
Before we get started, I want to make sure you understand the difference between doing something and assigning it to a new name and just doing it without naming it. For example, make sure you understand the following:
# get the first 3 rows
scores %>%
slice(1:3)
## # A tibble: 3 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 carol south johnson female 3 5
## 3 greg south johnson male 2 2
# get the first 3 rows, and assign it to a new name "scores_small"
scores_small <- scores %>%
slice(1:3)
# see what's in "scores_small"
scores_small
## # A tibble: 3 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 carol south johnson female 3 5
## 3 greg south johnson male 2 2
In this exercise we’ll typically just print the results and not save them, but it’s an option if you want it!
Now we can get to the exercise. Most sections will begin with an example for you to look at. When questions require a written answer, there will be an “Answer” line for you to complete.
Question: Sort the data by math_score from high to low. Who had the best math score?
scores %>%
arrange(desc(math_score))
## # A tibble: 9 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 bobby north smith male 5 1
## 2 alice south perry female 5 4
## 3 mike south johnson male 4 1
## 4 marcia south johnson female 4 4
## 5 jan north smith female 4 4
## 6 cindy south perry female 4 5
## 7 carol south johnson female 3 5
## 8 peter north smith male 3 5
## 9 greg south johnson male 2 2
Answer: Bobby and alice both tied for the highest math score
Question: Sort the data by name from first to last in the alphabet.
scores %>%
arrange(name)
## # A tibble: 9 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 alice south perry female 5 4
## 2 bobby north smith male 5 1
## 3 carol south johnson female 3 5
## 4 cindy south perry female 4 5
## 5 greg south johnson male 2 2
## 6 jan north smith female 4 4
## 7 marcia south johnson female 4 4
## 8 mike south johnson male 4 1
## 9 peter north smith male 3 5
Question: Sort the data by gender so females show up first. Which gender appears to have better reading scores?
scores %>%
arrange(gender)
## # A tibble: 9 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 carol south johnson female 3 5
## 2 marcia south johnson female 4 4
## 3 jan north smith female 4 4
## 4 cindy south perry female 4 5
## 5 alice south perry female 5 4
## 6 mike south johnson male 4 1
## 7 greg south johnson male 2 2
## 8 peter north smith male 3 5
## 9 bobby north smith male 5 1
Answer: Females seem to have better reading scores – they all got either 4 or 5.
Question: Sort the data by school, then teacher, then gender, then math_score, and finally by reading_score.
scores %>%
arrange(school, teacher, gender, math_score, reading_score)
## # A tibble: 9 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 jan north smith female 4 4
## 2 peter north smith male 3 5
## 3 bobby north smith male 5 1
## 4 carol south johnson female 3 5
## 5 marcia south johnson female 4 4
## 6 greg south johnson male 2 2
## 7 mike south johnson male 4 1
## 8 cindy south perry female 4 5
## 9 alice south perry female 5 4
Question: Select only the name, math_score, and reading_score columns.
scores %>%
select(name, math_score, reading_score)
## # A tibble: 9 x 3
## name math_score reading_score
## <chr> <dbl> <dbl>
## 1 mike 4 1
## 2 carol 3 5
## 3 greg 2 2
## 4 marcia 4 4
## 5 peter 3 5
## 6 jan 4 4
## 7 bobby 5 1
## 8 cindy 4 5
## 9 alice 5 4
Question: Select all of the columns except the gender column.
scores %>%
select(-gender)
## # A tibble: 9 x 5
## name school teacher math_score reading_score
## <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson 4 1
## 2 carol south johnson 3 5
## 3 greg south johnson 2 2
## 4 marcia south johnson 4 4
## 5 peter north smith 3 5
## 6 jan north smith 4 4
## 7 bobby north smith 5 1
## 8 cindy south perry 4 5
## 9 alice south perry 5 4
Question: Select all of the columns except the math_score and reading_score columns.
scores %>%
select(-math_score, -reading_score)
## # A tibble: 9 x 4
## name school teacher gender
## <chr> <chr> <chr> <chr>
## 1 mike south johnson male
## 2 carol south johnson female
## 3 greg south johnson male
## 4 marcia south johnson female
## 5 peter north smith male
## 6 jan north smith female
## 7 bobby north smith male
## 8 cindy south perry female
## 9 alice south perry female
Question: Keep all of the columns but rearrange them so gender is the first column.
scores %>%
select(gender, everything())
## # A tibble: 9 x 6
## gender name school teacher math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 male mike south johnson 4 1
## 2 female carol south johnson 3 5
## 3 male greg south johnson 2 2
## 4 female marcia south johnson 4 4
## 5 male peter north smith 3 5
## 6 female jan north smith 4 4
## 7 male bobby north smith 5 1
## 8 female cindy south perry 4 5
## 9 female alice south perry 5 4
Question: Filter to students who are male and went to south.
# Option 1
scores %>%
filter(gender == "male" & school == "south")
## # A tibble: 2 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 greg south johnson male 2 2
# Option 2
scores %>%
filter(gender == "male", school == "south")
## # A tibble: 2 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 greg south johnson male 2 2
Question: Filter to students who did well in math (you decide what “well” means).
scores %>%
filter(math_score >= 4)
## # A tibble: 6 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 marcia south johnson female 4 4
## 3 jan north smith female 4 4
## 4 bobby north smith male 5 1
## 5 cindy south perry female 4 5
## 6 alice south perry female 5 4
Question: Use filter to figure out how many students had a math score of 4 or more and a reading score of 3 or more.
scores %>%
filter(math_score >= 4, reading_score >= 3)
## # A tibble: 4 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 marcia south johnson female 4 4
## 2 jan north smith female 4 4
## 3 cindy south perry female 4 5
## 4 alice south perry female 5 4
Answer There are 4 such students.
Question: Explain the errors in each of the following code blocks, then fix it to make it right!
# code block 1
scores %>%
filter(school == south)
## Error in filter_impl(.data, quo): Evaluation error: object 'south' not found.
# code block 2
scores %>%
filter(school = "south")
## Error: `school` (`school = "south"`) must not be named, do you need `==`?
# fix it!
scores %>%
filter(school == "south")
## # A tibble: 6 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 carol south johnson female 3 5
## 3 greg south johnson male 2 2
## 4 marcia south johnson female 4 4
## 5 cindy south perry female 4 5
## 6 alice south perry female 5 4
Answer: Code block 1 doesn’t use quotation marks so R thinks it is looking for an object not a word. Code block 2 uses only one equals sign “=” when we need two equals signs “==” for comparison.
Question: You are creating a remediation program. Filter to students who got a 3 or worse in either math or reading.
scores %>%
filter(math_score <= 3 | reading_score <= 3)
## # A tibble: 5 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 carol south johnson female 3 5
## 3 greg south johnson male 2 2
## 4 peter north smith male 3 5
## 5 bobby north smith male 5 1
Question: Filter to students who got a reading score of 2, 3, or 4.
# option 1
scores %>%
filter(reading_score %in% c(2, 3, 4))
## # A tibble: 4 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 greg south johnson male 2 2
## 2 marcia south johnson female 4 4
## 3 jan north smith female 4 4
## 4 alice south perry female 5 4
# option 2 (if you know that 1 and 5 are the only other 2 options)
scores %>%
filter(reading_score != 1, reading_score != 5)
## # A tibble: 4 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 greg south johnson male 2 2
## 2 marcia south johnson female 4 4
## 3 jan north smith female 4 4
## 4 alice south perry female 5 4
Question: Filter to students who have a name that starts with an “m”. Hint: type “?substr” in the console and then scroll to the bottom of the help file to see useful examples.
scores %>%
filter(substr(name, 1, 1) == "m")
## # A tibble: 2 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 marcia south johnson female 4 4
Question: Filter to teachers whose best math student got a score of 5.
scores %>%
group_by(teacher) %>%
filter(max(math_score) == 5)
## # A tibble: 5 x 6
## # Groups: teacher [2]
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 peter north smith male 3 5
## 2 jan north smith female 4 4
## 3 bobby north smith male 5 1
## 4 cindy south perry female 4 5
## 5 alice south perry female 5 4
Question: Filter to the gender with a mean math score of 4.
scores %>%
group_by(gender) %>%
filter(mean(math_score) == 4)
## # A tibble: 5 x 6
## # Groups: gender [1]
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 carol south johnson female 3 5
## 2 marcia south johnson female 4 4
## 3 jan north smith female 4 4
## 4 cindy south perry female 4 5
## 5 alice south perry female 5 4
Question: Explain why the following code removes students who have perry as their teacher.
scores %>%
group_by(teacher) %>%
filter(n() >= 3)
## # A tibble: 7 x 6
## # Groups: teacher [2]
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 4 1
## 2 carol south johnson female 3 5
## 3 greg south johnson male 2 2
## 4 marcia south johnson female 4 4
## 5 peter north smith male 3 5
## 6 jan north smith female 4 4
## 7 bobby north smith male 5 1
Answer: The “n()” stands for the number of rows. This returns the teachers with 3 or more rows in the data frame. This excludes perry because there are only 2 students who have perry as their teacher.
Question: Both the math and reading scores were actually out of 50 – replace both variables to be 10 times their original values.
scores %>%
mutate(math_score = math_score * 10,
reading_score = reading_score * 10)
## # A tibble: 9 x 6
## name school teacher gender math_score reading_score
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 mike south johnson male 40 10
## 2 carol south johnson female 30 50
## 3 greg south johnson male 20 20
## 4 marcia south johnson female 40 40
## 5 peter north smith male 30 50
## 6 jan north smith female 40 40
## 7 bobby north smith male 50 10
## 8 cindy south perry female 40 50
## 9 alice south perry female 50 40
Question: Create a new column called “math_reading_avg” which is the average of a students math and reading scores.
scores %>%
mutate(math_reading_avg = (math_score + reading_score) / 2)
## # A tibble: 9 x 7
## name school teacher gender math_score reading_score math_reading_avg
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 mike south johnson male 4 1 2.5
## 2 carol south johnson female 3 5 4
## 3 greg south johnson male 2 2 2
## 4 marcia south johnson female 4 4 4
## 5 peter north smith male 3 5 4
## 6 jan north smith female 4 4 4
## 7 bobby north smith male 5 1 3
## 8 cindy south perry female 4 5 4.5
## 9 alice south perry female 5 4 4.5
Question: Create a new column “high_math_achiever” that is an indicator of if a student got a 4 or better on their math_score.
scores %>%
mutate(high_math_achiever = math_score >= 4)
## # A tibble: 9 x 7
## name school teacher gender math_score reading_score high_math_achiever
## <chr> <chr> <chr> <chr> <dbl> <dbl> <lgl>
## 1 mike south johnson male 4 1 TRUE
## 2 carol south johnson female 3 5 FALSE
## 3 greg south johnson male 2 2 FALSE
## 4 marcia south johnson female 4 4 TRUE
## 5 peter north smith male 3 5 FALSE
## 6 jan north smith female 4 4 TRUE
## 7 bobby north smith male 5 1 TRUE
## 8 cindy south perry female 4 5 TRUE
## 9 alice south perry female 5 4 TRUE
Question: Create a new column “reading_score_centered” that is a students reading score with the mean of all students’ reading scores subtracted from it.
scores %>%
mutate(reading_score_centered = reading_score - mean(reading_score))
## # A tibble: 9 x 7
## name school teacher gender math_score reading_score reading_score_cen…
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 mike south johnson male 4 1 -2.44
## 2 carol south johnson female 3 5 1.56
## 3 greg south johnson male 2 2 -1.44
## 4 marcia south johnson female 4 4 0.556
## 5 peter north smith male 3 5 1.56
## 6 jan north smith female 4 4 0.556
## 7 bobby north smith male 5 1 -2.44
## 8 cindy south perry female 4 5 1.56
## 9 alice south perry female 5 4 0.556
Question: Create a new column “science_score”. You can make up what the actual scores are!
scores %>%
mutate(science_score = c(1, 2, 3, 4, 5, 4, 3, 2, 1))
## # A tibble: 9 x 7
## name school teacher gender math_score reading_score science_score
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 mike south johnson male 4 1 1
## 2 carol south johnson female 3 5 2
## 3 greg south johnson male 2 2 3
## 4 marcia south johnson female 4 4 4
## 5 peter north smith male 3 5 5
## 6 jan north smith female 4 4 4
## 7 bobby north smith male 5 1 3
## 8 cindy south perry female 4 5 2
## 9 alice south perry female 5 4 1
Question: Mike and cindy both got a 4 for their math score. Explain why why Mike has a higher “math_score_centered_by_gender” score.
scores %>%
group_by(gender) %>%
mutate(math_score_centered_by_gender = math_score - mean(math_score))
## # A tibble: 9 x 7
## # Groups: gender [2]
## name school teacher gender math_score reading_score math_score_center…
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 mike south johnson male 4 1 0.5
## 2 carol south johnson female 3 5 -1
## 3 greg south johnson male 2 2 -1.5
## 4 marcia south johnson female 4 4 0
## 5 peter north smith male 3 5 -0.5
## 6 jan north smith female 4 4 0
## 7 bobby north smith male 5 1 1.5
## 8 cindy south perry female 4 5 0
## 9 alice south perry female 5 4 1
Answer: We are subtracting the mean math score for that person’s gender off of their score. Mike has a higher “math_score_centered_by_gender” because males did worse on average on math and so a small number is subtracted off of his score of 4.
Question: Create a “reading_score_centered_by_teacher” column. What can you learn from it?
scores %>%
group_by(teacher) %>%
mutate(reading_score_centered_by_teacher = reading_score - mean(reading_score))
## # A tibble: 9 x 7
## # Groups: teacher [3]
## name school teacher gender math_score reading_score reading_score_cen…
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 mike south johnson male 4 1 -2
## 2 carol south johnson female 3 5 2
## 3 greg south johnson male 2 2 -1
## 4 marcia south johnson female 4 4 1
## 5 peter north smith male 3 5 1.67
## 6 jan north smith female 4 4 0.667
## 7 bobby north smith male 5 1 -2.33
## 8 cindy south perry female 4 5 0.5
## 9 alice south perry female 5 4 -0.5
Answer: We can learn lots of things! For example, relative to other students in their class, bobby did worst on reading and carol did the best.
Question: Make a “number_of_students_in_class” column that is number of students in a student’s class. For example, it should be 4 for mike and 3 for peter.
scores %>%
group_by(teacher) %>%
mutate(number_of_students_in_class = n())
## # A tibble: 9 x 7
## # Groups: teacher [3]
## name school teacher gender math_score reading_score number_of_student…
## <chr> <chr> <chr> <chr> <dbl> <dbl> <int>
## 1 mike south johnson male 4 1 4
## 2 carol south johnson female 3 5 4
## 3 greg south johnson male 2 2 4
## 4 marcia south johnson female 4 4 4
## 5 peter north smith male 3 5 3
## 6 jan north smith female 4 4 3
## 7 bobby north smith male 5 1 3
## 8 cindy south perry female 4 5 2
## 9 alice south perry female 5 4 2
Question: Use the summarize command to find the mean math score for all students.
scores %>%
summarize(math_score_mean = mean(math_score))
## # A tibble: 1 x 1
## math_score_mean
## <dbl>
## 1 3.78
Question: Use the summarize command to find the mean reading score for all students.
scores %>%
summarize(reading_score_mean = mean(reading_score))
## # A tibble: 1 x 1
## reading_score_mean
## <dbl>
## 1 3.44
Question: Use the summarize command to find the median for both math scores and reading scores.
scores %>%
summarize(
math_score_median = median(math_score),
reading_score_median = median(reading_score)
)
## # A tibble: 1 x 2
## math_score_median reading_score_median
## <dbl> <dbl>
## 1 4 4
Question: Look closely at the following code. Why is it throwing an error? How can Rstudio help you see this error?
scores %>%
summarize(
math_score_max = max(math_score),
reading_score_min = min(reading_score)
## Error: <text>:5:0: unexpected end of input
## 3: math_score_max = max(math_score),
## 4: reading_score_min = min(reading_score)
## ^
Answer: We need another “)” at the end of the code. The first “)” is for the min function but we also need a “)” to end the summarize function. Rstudio helps because if you go to the right of a paranthese, it highlights the corresponding closing paranthese.
Question: Find the minimum math score for each school.
scores %>%
group_by(school) %>%
summarize(min_math_score = min(math_score))
## # A tibble: 2 x 2
## school min_math_score
## <chr> <dbl>
## 1 north 3
## 2 south 2
Question: Find the maximum math score for each teacher.
scores %>%
group_by(teacher) %>%
summarize(max_math_score = max(math_score))
## # A tibble: 3 x 2
## teacher max_math_score
## <chr> <dbl>
## 1 johnson 4
## 2 perry 5
## 3 smith 5
Question: If we grouped by gender, and then summarized with the minimum reading score, how many rows would the resulting data frame have?
Answer: There would be one for each group. Because there are two genders in the data frame, there would be two rows. See:
scores %>%
group_by(gender) %>%
summarize(min(reading_score)) # side note: notice what happens when we don't provide a variable name
## # A tibble: 2 x 2
## gender `min(reading_score)`
## <chr> <dbl>
## 1 female 4
## 2 male 1
Question: Remember that mutate always keeps the same number of rows but summarize usually reduces the number of rows. Why doesn’t the following use of summarize reduce the number of rows?
scores %>%
group_by(name) %>%
summarize(math_score_mean = mean(math_score))
## # A tibble: 9 x 2
## name math_score_mean
## <chr> <dbl>
## 1 alice 5
## 2 bobby 5
## 3 carol 3
## 4 cindy 4
## 5 greg 2
## 6 jan 4
## 7 marcia 4
## 8 mike 4
## 9 peter 3
Answer: Summarize with groups makes one row for each group. In this case, we’re grouping by name and everyone has a differente name!
Question: Create a data frame with the mean and median reading score by gender, as well as the number of students of that gender.
scores %>%
group_by(gender) %>%
summarize(
mean_reading_score = mean(reading_score),
median_reading_score = median(reading_score),
count = n()
)
## # A tibble: 2 x 4
## gender mean_reading_score median_reading_score count
## <chr> <dbl> <dbl> <int>
## 1 female 4.4 4 5
## 2 male 2.25 1.5 4
Question: Select just the name and math_score columns. Then create a new column “math_score_ec” that is a students math score plus 5 extra credit points. Finally, arrange the data frame by math_score_ec from low to high.
scores %>%
select(name, math_score) %>%
mutate(math_score_ec = math_score + 5) %>%
arrange(math_score_ec)
## # A tibble: 9 x 3
## name math_score math_score_ec
## <chr> <dbl> <dbl>
## 1 greg 2 7
## 2 carol 3 8
## 3 peter 3 8
## 4 mike 4 9
## 5 marcia 4 9
## 6 jan 4 9
## 7 cindy 4 9
## 8 bobby 5 10
## 9 alice 5 10
Question: Select every column except the teacher column. Create a new variabled called “mean_score” that is the mean of a student’s math and reading score. Finally, arrange the data frame by mean_score from low to high.
scores %>%
select(-teacher) %>%
mutate(mean_score = (math_score + reading_score) / 2) %>%
arrange(desc(mean_score))
## # A tibble: 9 x 6
## name school gender math_score reading_score mean_score
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 cindy south female 4 5 4.5
## 2 alice south female 5 4 4.5
## 3 carol south female 3 5 4
## 4 marcia south female 4 4 4
## 5 peter north male 3 5 4
## 6 jan north female 4 4 4
## 7 bobby north male 5 1 3
## 8 mike south male 4 1 2.5
## 9 greg south male 2 2 2
Question: Remove any students with smith as a teacher, then find the mean math_score by gender.
scores %>%
filter(teacher != "smith") %>%
group_by(gender) %>%
summarize(mean_math_score = mean(math_score))
## # A tibble: 2 x 2
## gender mean_math_score
## <chr> <dbl>
## 1 female 4
## 2 male 3
Question: Find the min, max, and median reading_score for female students at south school.
scores %>%
filter(gender == "female", school == "south") %>%
summarize(min_reading_score = min(reading_score),
max_reading_score = max(reading_score),
median_reading_score = median(reading_score))
## # A tibble: 1 x 3
## min_reading_score max_reading_score median_reading_score
## <dbl> <dbl> <dbl>
## 1 4 5 4.5
Question: Inspect each of the following code blocks. They both do about the same thing. Which one do you think is preffered from a computer efficiency standpoint?
# code block 1
scores %>%
group_by(school, teacher) %>%
summarize(max_math_score = max(math_score)) %>%
filter(school == "south")
## # A tibble: 2 x 3
## # Groups: school [1]
## school teacher max_math_score
## <chr> <chr> <dbl>
## 1 south johnson 4
## 2 south perry 5
# code block 2
scores %>%
filter(school == "south") %>%
group_by(teacher) %>%
summarize(max_math_score = max(math_score))
## # A tibble: 2 x 2
## teacher max_math_score
## <chr> <dbl>
## 1 johnson 4
## 2 perry 5
Answer: They both get the max math score by teacher for teachers at south school. The first block calculates the max_math_score for both north and south and then filters out north after that calculation. The second block filters out north right away. This is preferred because it prevents the computer from making unnecessary calculations.
Play around with these tools. Write a question or two that you think best exposes a misunderstanding you had or drills down on an important thing to remember. I’d love to add these questions in the future! Feel free to email what you came up with to stenhaug@stanford.edu.