Data Visualisation – Performance Visualisation in R
Lesson tasks:
- Create a scatterplot
- Create a shot map
If you are visiting this lesson independently, at the bottom of the page is the code in full with more detailed comments.
Get started by downloading and opening the R project file from the course files (opens in new tab).
As always, the first thing we need to do is load the packages we’ll be using. We’ve used the tidyverse before. Previously we’ve not used ggrepel or ggsoccer before, these packages assist with creating visualisations.
# Load packages
if (require("tidyverse") == FALSE) {
install.packages("tidyverse")
}
library(tidyverse)
#--
if (require("ggrepel") == FALSE) {
install.packages("ggrepel")
}
library(ggrepel)
#--
if (require("ggsoccer") == FALSE) {
install.packages("ggsoccer")
}
library(ggsoccer)
Let’s store the data sets we processed in the last lesson into variables for us to use. The data is from the UEFA Women’s Euro 2022.
# Load data sets
shot_data <- read.csv("shot_data.csv", encoding = "latin1")
summary <- read.csv("summary.csv", encoding = "latin1")
Since this lesson is preplanned, I can tell you that the is_goal column needs to be a factor data type. We used the class() function in the first lesson to find out the data type and we can do that again!
# find out the data type of is_goal using class()
# The dollar $ sign is used to specify a column in a dataframe.
# shot_data$is_goal is assessing the is_goal column from the shot_data variable.
class(shot_data$is_goal)
# class() will return that is_goal is a interger
# We can change the data type of is_goal to a factor
#with the as.factor() function.
shot_data$is_goal <- as.factor(shot_data$is_goal)
Everything is *almost* done to start creating a scatterplot. We can, of course, plot the whole data set. Although it’s not in the spirit of creating visualisations that can easily be interpreted. For ease to begin with, let’s take the first 15 rows of the summary data set and plot them. Note how we’re saving those 15 rows into a new variable, this is so we can preserve the summary dataset as a whole for future manipulations.
scatterplot_summary <- head(summary, n = 15)
We have our data for the plot sorted, now let’s create the plot! We’re going to start basic and built up. Below is the minimum you need for a very basic (and not very helpful) scatterplot. Think of ggplot() as a canvas on which we put the data then we add geom_point() which are the dots that appear on the canvas.
# The minimum requirements you need to make a scatterplot
ggplot(data = scatterplot_summary, # Using the data we filtered
aes(x = shots, # On the x axis shots will be plot
y = goals)) + # On the y axis goals will be plot
geom_point()
# Save the plot
ggsave(
plot = last_plot(),
"bare_scatter_plot.png",
dpi = 320,
height = 10,
width = 12)
Note: In this example, we are plotting shots and goals in their absolute value form which as mentioned in the previous lesson isn’t ideal data to use to compare players since it doesn’t account for differences in time each player has had on the pitch. When you create your own scatterplot please use the per 90 columns of data!
The scatterplot created from the above code while being a scatterplot doesn’t tell us any useful information. Let’s add some labels, we do that by adding the geom_text_repel() function
ggplot(data = scatterplot_summary, #using same data, x, and y as above.
aes(x = shots,
y = goals)) +
geom_point() +
geom_text_repel(
aes(label = player.name), # Labels the data point with the player name.
max.overlaps = getOption("ggrepel.max.overlaps", default = 20), # Here we
# state how many points of the plot can have overlaping values.
box.padding = unit(0.35, "lines"), # Increasing the 0.35 value will create
# more space around the player name label.
point.padding = unit(0.3, "lines"), # Will increase the space around
# the data points.
arrow = arrow(length = unit(0.01, 'npc')) # Adds an arrow if the label is
# far away from the data point.
)
# Save the plot
ggsave(
plot = last_plot(),
"labeled_scatter_plot.png",
dpi = 320,
height = 10,
width = 12)
One of the reasons R is so popular for creating visualisation is because of the amount of customisability you have. We can change the colour of the points, add labels, add a regression line, and this only touches the surface.
ggplot(data = scatterplot_summary, #using same data, x, and y as before.
aes(x = shots,
y = goals)) +
geom_point(color = 'blue') + # We can change the data points to be blue.
geom_smooth(method = lm, color = "red") + # This line of code adds a
# regression line.
geom_text_repel(
aes(label = player.name), # Labels the data point with the player name.
max.overlaps = getOption("ggrepel.max.overlaps", default = 20), # Here we
# state how many points of the plot can have overlaping values.
box.padding = unit(0.35, "lines"), # Increasing the 0.35 value will create
# more space around the player name label.
point.padding = unit(0.3, "lines"), # Will increase the space around
# the data points.
arrow = arrow(length = unit(0.01, 'npc')) # Adds an arrow if the label is
# far away from the data point.
) +
labs( # We can add labels that give the plot a title and a caption.
title = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb")
# Save the plot
ggsave(
plot = last_plot(),
"customised_scatter_plot.png",
dpi = 320,
height = 10,
width = 12)
Hopefully, you’re seeing that using ggplot() is more friendly than it may initially look. We will do one more example together before you create your own. Let’s say we want to compare the shots and goals of players on different teams. Let’s apply a different filter to the data being stored in the scatterplot_summary variable.
# Filter data that will be used in the scatterplot.
scatterplot_summary <- summary %>%
filter(shots > 0) %>% # Removing players who =haven't made any shots.
# Keeping plays that are in Sweden's or England's team.
# The | operator is the or operator. Therefore the code keeps
# plays in the data set that belong to Sweden or England.
filter(team.name == "Sweden Women's" | team.name == "England Women's")
# Create the scatterplot
ggplot(data = scatterplot_summary, # Using same data variable (although
#filtered differently this time!)
# Still using same variables in x,
aes(x = shots, # and y as before.
y = goals)) +
geom_point((aes(colour = team.name))) + # Setting the colour of the
# data points to corospond to team.name.
# The scale_color_manual() store what colour the each teams data
# points will be.
scale_color_manual(values = c("Sweden Women's" = "yellow", "England Women's" = "red")) +
geom_smooth(method = lm) +
geom_text_repel(
aes(label = player.name),
max.overlaps = getOption("ggrepel.max.overlaps", default = 20),
box.padding = unit(0.8, "lines"), # really had to fiddle with this
# value in order to see the lables
point.padding = unit(0.3, "lines"),
arrow = arrow(length = unit(0.01, 'npc'))
) +
labs(title = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb")
# Save the plot
ggsave(
plot = last_plot(),
"team_comparison_scatter_plot.png",
dpi = 320,
height = 10,
width = 12)
Challenge!
Imagine you are a recruiter and from the data you have, you want to assess what players are performing above what is expected of them. Produce a scatterplot in which you would feel confident explaining players’ performance.
Programming is a lot of problem-solving please do look things up, google is your friend!*
*Other search engines available.
Click to reveal ⬇ ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Nope, this one is all on you.
Shot maps
Now we’ve enjoyed scatterplots suitably, let’s move on to make some shot maps! Much like we did for the scatterplot, we will be filtering our data down to a more targeted sample. In the optional lessons, you can make an interactive shot map. We are going to program the shot map in a way which will translate into being interactive quite nicely. We’re going to ask the user to specify how many nineties they want the player to have played and how many of the top players (ranked by npxg_p90) they want to plot shot maps for. The shot map will use these inputs to filter the data and then create the shot maps.
# The user_input_nineties variable holds the amount of nineties the
# user wants to the player to have played. Since this isn't interactive
# yet, we are going to specifiy the value.
user_input_nineties = 2
# Holds the value the user specifies for how many top ranked players to
# plot shot maps for.
user_input_top_players = 6
# Filter summary dataset and save the output into the shot_map_summary variable
shot_map_summary <- summary %>%
filter(nineties >= user_input_nineties) %>% head(n = user_input_top_players)
# Note: The >= is the greater than or equal to operator.
We apply the user input to filter the summary data and store the results in the shot_map_summary variable. The
We apply the user input to filter the summary data and store the results in the shot_map_summary variable. The
We apply the user input to filter the summary data and store the results in the shot_map_summary variable. The shot_map_summary is then used to pull the relevant player data from the shot_data and store it in the shot_map_plot_data variable.
shot_map_plot_data <- shot_data %>%
# Searches for player name in the shot_map_summary dataframe we
# filtered using the user input. The filter will only return players
# who appear in the shot_map_summary dataframe.
filter(player.name %in% unique(shot_map_summary$player.name)) %>%
# select() then specifies which columns to add to the shot_map_summary
select(
player.name,
location.x,
location.y,
is_goal,
shot.outcome.name,
shot.statsbomb_xg,
team.name)
Our data is ready, let’s create the plot. Like we did last time, we’re going to build it up piece by piece. First, let’s draw a pitch.
ggplot() + # ggplot() is the canvas the shot map will be displayed on
annotate_pitch(dimensions = pitch_statsbomb) + # annotate_pitch() draws a pitch
theme_pitch() + # applies cosmetic changes so we can view the pitch better.
coord_flip(xlim = c(60, 120), # flips the pitch for better visulation of shots.
ylim = c(0, 80))
# Save plot
ggsave(
plot = last_plot(),
"empty_pitch.png",
dpi = 320,
height = 10,
width = 12
)
Now the pitch is drawn let’s add some data. We’re going to use two geom_point() functions to plot our points, one will be for shots that resulted in a goal and the other for all the shots that didn’t result in a goal.
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) + # draw the pitch
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 0), # not goal shots.
aes(x = location.x,
y = location.y)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 1), # shots that resulted in a goal.
aes(x = location.x,
y = location.y))
# Save plot
ggsave(
plot = last_plot(),
"all_data_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
At this stage, the shot map is a shot map, but not a very informative one. Currently, it’s plotting all the data and not differentiating between players. We can use the facet_wrap() function to create a shot map for each player.
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) + # draw the pitch
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 0), # not goal shots.
aes(x = location.x,
y = location.y)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 1), # shots that resulted in a goal.
aes(x = location.x,
y = location.y)) +
facet_wrap(~ player.name) # creates a shot map for each player
# Save plot
ggsave(
plot = last_plot(),
"per_player_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
Now we have a shot map for each player, we can add some colour to the geom_point() functions so we can see the difference between shots that were a goal or not.
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) + # draw the pitch
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y),
shape = 21,
fill = "#FC8D62", # Fills non goal data points orange.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y),
shape = 21,
fill = "#66C2A5", # Fills goal data points green.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
facet_wrap(~ player.name) # creates a shot map for each player
# Save plot
ggsave(
plot = last_plot(),
"per_player_cosmetic_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
When you have so much data to use, you might as well make the most of it! We are going to scale the shot locations on the shot map using the shot.statsbomb_xg column. The bigger the data point on the shop map, the more expected it was for that shot to be a goal.
Before plotting, ask yourself what would you expect to see on the shot map from a good player.
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) +
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
# Here is where we scale the data points
# from the shot.statsbomb_xg column
shape = 21,
fill = "#FC8D62",
# Fills non goal data points orange.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
# Here is where we scale the data points
# from the shot.statsbomb_xg column
shape = 21,
fill = "#66C2A5",
# Fills goal data points green.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
facet_wrap(~ player.name) # creates a shot map for each player
# Save plot
ggsave(
plot = last_plot(),
"weighted_data_points_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
Let’s use the labs() function to add some labels to the shot map. Although, with the aim of adding interactivity. The title uses the paste() function so the user input can be displayed in the title.
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) +
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#FC8D62",
stroke = 0.6,
colour = "black"
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#66C2A5",
stroke = 0.6,
colour = "black"
) +
facet_wrap(~ player.name) +
labs(
title = paste(
"Top",
user_input_top_players,
"players rated on Non Penalty xG P90, which have played",
user_input_nineties,
"nineties."
),
subtitle = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb",
size = "Expected goal value"
)
# Save plot
ggsave(
plot = last_plot(),
"labeled_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
When you think there is nothing else you could possibly add to this shot map… What about adding the per nineties stats of the player on their shot map? We want our visualisation to pour out useful information! We can add the per nineties stats using geom_text() functions.
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) +
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#FC8D62",
stroke = 0.6,
colour = "black"
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#66C2A5",
stroke = 0.6,
colour = "black"
) +
facet_wrap( ~ player.name) +
labs(
title = paste(
"Top",
user_input_top_players,
"players rated on Non Penalty xG P90, which have played",
user_input_nineties,
"nineties."
),
subtitle = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb",
size = "Expected goal value"
) +
geom_text(data = shot_map_summary,
aes(
x = 80,
y = 15,
label = paste("Shots P90:", round(shots_p90, digits = 2))
)) +
geom_text(data = shot_map_summary,
aes(
x = 74,
y = 15,
label = paste("Goals P90:", round(goals_p90, digits = 2))
)) +
geom_text(data = shot_map_summary,
aes(
x = 68,
y = 15,
label = paste("NPxG P90:", round(npxg_p90, digits = 2))
))
# Save plot
ggsave(
plot = last_plot(),
"a_shot_map_masterpiece.png",
dpi = 320,
height = 10,
width = 12
)
If you’re super eagle-eyed then you may have noticed that the shot maps are arranged in alphabetical order of the player’s first name. This is due to the facet_wrap() function. If you were presenting this data then you would want to resolve this to ensure effective communication. Do feel free to see this as a challenge to tackle!
Wow, would you look at that? How beauty takes such form in a shot map. Certainly, take some time to play around and create some more shot maps. Perhaps you could think about how you might want to filter data in order to target what you’d like to visualise. For example, you could filter the data based on team as we did for the scatterplot. The data is your oyster.
Want the code from this lesson in one chunk? ⬇⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
You can download the whole data processing script from the course files (opens in new tab).
Alternatively, you can copy and paste the below:
#----
#---------- Load packages
# An if statment is used to evaluate if the required package is
# already installed. If the package is not installed the code will
# install the package. The package will be loaded in the library() function.
if (require("tidyverse") == FALSE) {
install.packages("tidyverse")
}
library(tidyverse)
#--
if (require("ggrepel") == FALSE) {
install.packages("ggrepel")
}
library(ggrepel)
#--
if (require("ggsoccer") == FALSE) {
install.packages("ggsoccer")
}
library(ggsoccer)
##---------------------------
#----
#------- Load the UEFA Women's Euro 2022 data files created previously into variables.
#------- The encoding ensures cyrillic, latin, etc, letters are displayed correctly.
shot_data <- read.csv("shot_data.csv", encoding = "latin1")
summary <- read.csv("summary.csv", encoding = "latin1")
print("data loaded")
##---------------------------
#----
#------- An important aspect of programming is ensuring you're using the appropriate
#------- data type in a function. Later on we will be using the is_goal column which is
#------- required to be the factor data type. We can acquire the data type of the
#------- is_goal column using the class() function we used in the first lesson.
# The dollar $ sign is used to specify a column in a dataframe.
# shot_data$is_goal is accessing the is_goal column from the shot_data variable.
class(shot_data$is_goal)
#------- Currently is_goal is considered an integer, we can change the data type
#-------with the following code.
shot_data$is_goal <- as.factor(shot_data$is_goal)
##---------------------------
#------- First we are going to filter our data so we don't plot everything on the scatterplot.
#------- Using the head() function I can save the top 15 rows in the summary data set (which is organised by npxg_p90)
#into a new variable which we will use for the scatterplot.
# The reason we save to a new variable is so we can preserve the full summary dataset
# for future manipulations
scatterplot_summary <- head(summary, n = 15)
##---------------------------
#----- Let's create a scatterplot
# First we call the ggplot() function, this creates a canvas for us to plot on.
# we specifiy the variable we want to plot from and what data columns do on the x and y axis.
# The x and y is specified in the aes() function nested inside of ggplot().
# In this example shots are on the x axis (horizontal) and goals are on the y axis (vertial).
# Do remember that shots and goals absoulte values, when you make your own plots you can
# use the per 90 columns we calculated.
# We then add the geom_point() function to the canvas which will mark our data points.
# The minimum requirements you need to make a scatterplot
ggplot(data = scatterplot_summary, # Using the data we filtered
aes(x = shots, # On the x axis shots will be plot
y = goals)) + # On the y axis goals will be plot
geom_point()
# Save the plot
ggsave(
plot = last_plot(),
"bare_scatter_plot.png", # Saves image of plot to R project folder.
dpi = 320,
height = 10,
width = 12)
#----------------
#---
#------- Create labeled scatterplot by adding geom_text_repel()
ggplot(data = scatterplot_summary, #using same data, x, and y as above.
aes(x = shots,
y = goals)) +
geom_point() +
geom_text_repel(
aes(label = player.name), # Labels the data point with the player name.
max.overlaps = getOption("ggrepel.max.overlaps", default = 20), # Here we
# state how many points of the plot can have overlaping values.
box.padding = unit(0.35, "lines"), # Increasing the 0.35 value will create
# more space around the player name label.
point.padding = unit(0.3, "lines"), # Will increase the space around
# the data points.
arrow = arrow(length = unit(0.01, 'npc')) # Adds an arrow if the label is
# far away from the data point.
)
# Save the plot
ggsave(
plot = last_plot(),
"labeled_scatter_plot.png",
dpi = 320,
height = 10,
width = 12)
#---------------------------
#----
#------- Customise the scatterplot
# You can go ahead and change the colour of the geom_point(). We can use geom_smooth() to plot
# a regression line and colour it red The labs() functions allows us to label the plot. There are
# tons more mays to customise your plot. This is why R has a reputation for being great at producting
# visulisations.
ggplot(data = scatterplot_summary, #using same data, x, and y as before.
aes(x = shots,
y = goals)) +
geom_point(color = 'blue') + # We can change the data points to be blue.
geom_smooth(method = lm, color = "red") + # This line of code adds a
# regression line.
geom_text_repel(
aes(label = player.name), # Labels the data point with the player name.
max.overlaps = getOption("ggrepel.max.overlaps", default = 20), # Here we
# state how many points of the plot can have overlaping values.
box.padding = unit(0.35, "lines"), # Increasing the 0.35 value will create
# more space around the player name label.
point.padding = unit(0.3, "lines"), # Will increase the space around
# the data points.
arrow = arrow(length = unit(0.01, 'npc')) # Adds an arrow if the label is
# far away from the data point.
) +
labs( # We can add labels that give the plot a title and a caption.
title = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb")
# Save the plot
ggsave(
plot = last_plot(),
"customised_scatter_plot.png",
dpi = 320,
height = 10,
width = 12)
#---------------------------
#-----
#-------- Create a scatterplot to compare players in specified teams.
# Filter data that will be used in the scatterplot.
scatterplot_summary <- summary %>%
filter(shots > 0) %>% # Removing players who =haven't made any shots.
# Keeping plays that are in Sweden's or England's team.
# The | operator is the or operator. Therefore the code keeps
# plays in the data set that belong to Sweden or England.
filter(team.name == "Sweden Women's" | team.name == "England Women's")
# Create the scatterplot.
ggplot(data = scatterplot_summary, # Using same data variable (although
#filtered differently this time!)
# Still using same variables in x,
aes(x = shots, # and y as before.
y = goals)) +
geom_point((aes(colour = team.name))) + # Setting the colour of the
# data points to corospond to team.name.
# The scale_color_manual() store what colour the each teams data
# points will be.
scale_color_manual(values = c("Sweden Women's" = "yellow", "England Women's" = "red")) +
geom_smooth(method = lm) +
geom_text_repel(
aes(label = player.name),
max.overlaps = getOption("ggrepel.max.overlaps", default = 20),
box.padding = unit(0.8, "lines"), # really had to fiddle with this
# value in order to see the lables
point.padding = unit(0.3, "lines"),
arrow = arrow(length = unit(0.01, 'npc'))
) +
labs(title = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb")
# Save the plot
ggsave(
plot = last_plot(),
"team_comparison_scatter_plot.png",
dpi = 320,
height = 10,
width = 12)
#---------------
########
########
#----- Let's make some shot maps!
########
########
#----
#-------- Simulate taking user input and use that to filter the data.
user_input_nineties = 2
user_input_top_players = 6
#---------------
#----
#-------- Filter summary dataset and save the output into the shot_map_summary variable
shot_map_summary <- summary %>%
filter(nineties >= user_input_nineties) %>% head(n = user_input_top_players)
# Note: The >= is the greater than or equal to operator.
#---------------
#----
#-------- Uses the filtered shot_map_summary data to pull the relevant data.
shot_map_plot_data <- shot_data %>%
# Searches for player name in the shot_map_summary dataframe we
# filtered using the user input. The filter will only return players
# who appear in the shot_map_summary dataframe.
filter(player.name %in% unique(shot_map_summary$player.name)) %>%
# select() then specifies which columns to add to the shot_map_summary
select(
player.name,
location.x,
location.y,
is_goal,
shot.outcome.name,
shot.statsbomb_xg,
team.name
)
#---------------
#----
#-------- Draw a pitch for the shot map
ggplot() + # ggplot() is the canvas the shot map will be displayed on
annotate_pitch(dimensions = pitch_statsbomb) + # annotate_pitch() draws a pitch
theme_pitch() + # applies cosmetic changes so we can view the pitch better.
coord_flip(xlim = c(60, 120), # flips the pitch for better visulation of shots.
ylim = c(0, 80))
# Save plot
ggsave(
plot = last_plot(),
"empty_pitch.png",
dpi = 320,
height = 10,
width = 12
)
#-----------------
#----
#--------- Shot map, all data
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) + # draw the pitch
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 0), # not goal shots.
aes(x = location.x,
y = location.y)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 1), # shots that resulted in a goal.
aes(x = location.x,
y = location.y))
# Save plot
ggsave(
plot = last_plot(),
"all_data_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
#----------------
#----
#-------- Shot map per player
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) + # draw the pitch
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 0), # not goal shots.
aes(x = location.x,
y = location.y)) +
geom_point(data = shot_map_plot_data %>% # plot the location the
filter(is_goal == 1), # shots that resulted in a goal.
aes(x = location.x,
y = location.y)) +
facet_wrap(~ player.name) # creates a shot map for each player
# Save plot
ggsave(
plot = last_plot(),
"per_player_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
#----------------
#----
#-------- Shot map with different colours for non goal shots and goals
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) + # draw the pitch
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y),
shape = 21,
fill = "#FC8D62", # Fills non goal data points orange.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y),
shape = 21,
fill = "#66C2A5", # Fills goal data points green.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
facet_wrap(~ player.name) # creates a shot map for each player
# Save plot
ggsave(
plot = last_plot(),
"per_player_cosmetic_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
#---------------------------
#----
#-------- Shot map with weighted data points
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) +
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
# Here is where we scale the data points
# from the shot.statsbomb_xg column
shape = 21,
fill = "#FC8D62",
# Fills non goal data points orange.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
# Here is where we scale the data points
# from the shot.statsbomb_xg column
shape = 21,
fill = "#66C2A5",
# Fills goal data points green.
stroke = 0.6,
colour = "black" # We will also add a nice outline
) +
facet_wrap(~ player.name) # creates a shot map for each player
# Save plot
ggsave(
plot = last_plot(),
"weighted_data_points_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
#---------------------
#----
#------- Shot map with labels
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) +
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#FC8D62",
stroke = 0.6,
colour = "black"
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#66C2A5",
stroke = 0.6,
colour = "black"
) +
facet_wrap(~ player.name) +
labs(
title = paste(
"Top",
user_input_top_players,
"players rated on Non Penalty xG P90, which have played",
user_input_nineties,
"nineties."
),
subtitle = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb",
size = "Expected goal value"
)
# Save plot
ggsave(
plot = last_plot(),
"labeled_shot_map.png",
dpi = 320,
height = 10,
width = 12
)
#-------------------
#----
#--------- The final shot map!
ggplot() +
annotate_pitch(dimensions = pitch_statsbomb) +
theme_pitch() +
coord_flip(xlim = c(60, 120),
ylim = c(0, 80)) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 0),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#FC8D62",
stroke = 0.6,
colour = "black"
) +
geom_point(
data = shot_map_plot_data %>%
filter(is_goal == 1),
aes(x = location.x,
y = location.y,
size = shot.statsbomb_xg),
shape = 21,
fill = "#66C2A5",
stroke = 0.6,
colour = "black"
) +
facet_wrap( ~ player.name) +
labs(
title = paste(
"Top",
user_input_top_players,
"players rated on Non Penalty xG P90, which have played",
user_input_nineties,
"nineties."
),
subtitle = "UEFA Women's Euro 2022",
caption = "Data: StatsBomb",
size = "Expected goal value"
) +
geom_text(data = shot_map_summary,
aes(
x = 80,
y = 15,
label = paste("Shots P90:", round(shots_p90, digits = 2))
)) +
geom_text(data = shot_map_summary,
aes(
x = 74,
y = 15,
label = paste("Goals P90:", round(goals_p90, digits = 2))
)) +
geom_text(data = shot_map_summary,
aes(
x = 68,
y = 15,
label = paste("NPxG P90:", round(npxg_p90, digits = 2))
))
# Save plot
ggsave(
plot = last_plot(),
"a_shot_map_masterpiece.png",
dpi = 320,
height = 10,
width = 12
)
#-----------------
print("taaa daaaaa!")
Further resources:
If you wish to pursue R independently, I recommend the following online resources to aid you on your learning journey.