World Cup 2018 is Finally Here

A Quick Analysis of all Teams

It has been a long, long 1,423 days since the conclusion of the last World Cup. Luckily, this long wait is almost over which means it is time to analyze everything and anything World Cup related.

In this post, let’s start with getting all the squads.

We will gather all the team data from the Wikipedia page.

First, load libraries:

# load libraries  
library(tidyverse) # for data manipulation
library(rvest) # for web scraping
library(lubridate) # to work with the birthdates
library(DT) # to create an interactive table at the end

After, get the url for the Wikipedia page:

url <- "https://en.wikipedia.org/wiki/2018_FIFA_World_Cup_squads"

There are 32 teams and each one is in a table and each table has a sequential number in the XPath for that table.

This is also true for the team names but they are in headers which also contain the same set of sequential numbers.

Since we know this, we can first create a vector of sequential numbers from 1 to 32:

table_numbers <- c(1:32)

Next, write a function to get all the player data from each table:

# function to get table for each team
get_teams <- function(tbl_nums) {
  
  # get country names
  team_name <-   url %>%
    read_html() %>%
    html_node(xpath = paste0('//*[@id="mw-content-text"]/div/h3[',tbl_nums,']')) %>%
    html_text
  
  # get squad info
  url %>%
    read_html() %>%
    html_node(xpath = paste0('//*[@id="mw-content-text"]/div/table[',tbl_nums,']')) %>%
    html_table(fill = TRUE) %>%
    as.tibble() %>%
    mutate(team_name = team_name) # add country names
}

Then, use `map` to iterate over vector of numbers and pull player data for each team:

teams <- map_df(table_numbers, get_teams)

## Warning: `as.tibble()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.

## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.

Data Clean-Up

We will be stacking all the data for 2018 squads on top of all the historical team data for analysis later. As a result we need to make some changes to our extracted data so we can bind_rows with the historical data.

# clean up the data
teams <- teams %>%
  mutate(`No.` = as.character(`No.`), ClubCountry = NA, Year = 2018) %>%
  rename(No = `No.`, Pos = `Pos.`, `DOB/Age` = `Date of birth (age)`, Country = team_name) %>%
  select(No,Pos,Player,`DOB/Age`,Caps,Club,Country,ClubCountry,Year)

Get Historical Teams

Thanks to this repo by S. Anand we can get all historical teams:

## read in historical data
historic_data <- read_csv("https://raw.githubusercontent.com/sanand0/fifadata/master/squads.csv")

## Parsed with column specification:
## cols(
##   No = col_character(),
##   Pos = col_character(),
##   Player = col_character(),
##   `DOB/Age` = col_character(),
##   Caps = col_character(),
##   Club = col_character(),
##   Country = col_character(),
##   ClubCountry = col_character(),
##   Year = col_integer()
## )

A Little More Clean-Up

We have to change the data type for the Caps column to match

# change data type for Caps to match
historic_data <- historic_data %>%
  mutate(Caps = as.integer(Caps))

## Warning in evalq(as.integer(Caps), <environment>): NAs introduced by
## coercion

Stack Everything

Finally, we can stack the data and we have all teams for 2018 as well as all historical teams in one table.

# stack 2018 data and historical data
all_squads <- bind_rows(historic_data,teams)

The table below contains all the squad details for the 2018 World Cup:

Show entries

Search:

	Pos	Player	DOB/Age	Caps	Country	Club
				0 158
1	1GK	Essam El Hadary (captain)	(1973-01-15)15 January 1973 (aged 45)	158	Egypt[edit]	Al-Taawoun
2	2DF	Ali Gabr	(1989-01-01)1 January 1989 (aged 29)	21	Egypt[edit]	West Bromwich Albion
3	2DF	Ahmed Elmohamady	(1987-09-09)9 September 1987 (aged 30)	78	Egypt[edit]	Aston Villa
4	3MF	Omar Gaber	(1992-01-30)30 January 1992 (aged 26)	24	Egypt[edit]	Los Angeles FC
5	3MF	Sam Morsy	(1991-09-10)10 September 1991 (aged 26)	5	Egypt[edit]	Wigan Athletic
6	2DF	Ahmed Hegazi	(1991-01-25)25 January 1991 (aged 27)	45	Egypt[edit]	West Bromwich Albion
7	2DF	Ahmed Fathy	(1984-11-10)10 November 1984 (aged 33)	126	Egypt[edit]	Al Ahly
8	3MF	Tarek Hamed	(1988-10-24)24 October 1988 (aged 29)	21	Egypt[edit]	Zamalek
9	4FW	Marwan Mohsen	(1989-02-26)26 February 1989 (aged 29)	24	Egypt[edit]	Al Ahly
10	4FW	Mohamed Salah	(1992-06-15)15 June 1992 (aged 25)	57	Egypt[edit]	Liverpool
11	4FW	Kahraba	(1994-04-13)13 April 1994 (aged 24)	19	Egypt[edit]	Al-Ittihad
12	2DF	Ayman Ashraf	(1991-04-09)9 April 1991 (aged 27)	4	Egypt[edit]	Al Ahly
13	2DF	Mohamed Abdel Shafy	(1985-07-01)1 July 1985 (aged 32)	51	Egypt[edit]	Al-Fateh
14	4FW	Ramadan Sobhi	(1997-01-23)23 January 1997 (aged 21)	23	Egypt[edit]	Stoke City
15	2DF	Mahmoud Hamdy	(1995-06-01)1 June 1995 (aged 23)	0	Egypt[edit]	Zamalek
16	1GK	Sherif Ekramy	(1983-07-10)10 July 1983 (aged 34)	22	Egypt[edit]	Al Ahly
17	3MF	Mohamed Elneny	(1992-07-11)11 July 1992 (aged 25)	61	Egypt[edit]	Arsenal
18	4FW	Shikabala	(1986-03-05)5 March 1986 (aged 32)	29	Egypt[edit]	Al-Raed
19	3MF	Abdallah El Said	(1985-07-13)13 July 1985 (aged 32)	36	Egypt[edit]	KuPS
20	2DF	Saad Samir	(1989-04-01)1 April 1989 (aged 29)	11	Egypt[edit]	Al Ahly
21	3MF	Trézéguet	(1994-10-01)1 October 1994 (aged 23)	24	Egypt[edit]	Kasımpaşa
22	4FW	Amr Warda	(1993-09-17)17 September 1993 (aged 24)	16	Egypt[edit]	Atromitos
23	1GK	Mohamed El Shenawy	(1988-12-18)18 December 1988 (aged 29)	3	Egypt[edit]	Al Ahly

Showing 1 to 23 of 736 entries

Previous1 2 3 4 5…32Next

In the next post, we will break up date of birth into more useful pieces and vizualize some of this data.

All World Cup Squads