Section 3 Data Pre-processing
3.1 Read data into R
The National Chargepoint Register(NCR) is a database of publicly-available chargepoints for electric vehicles in the UK established in 2011(,2021). you can access data in this link
Now, let’s read original data in R
= read.csv(here::here("dataset","national-charge-point-registry.csv"))
UK_NCR# This may take for a while, which depends on the speed of internet
# you can have a overview of this dataset
print("The number of rows is: ")
nrow(UK_NCR)
print("The number of columns is: ")
ncol(UK_NCR)
print("70 of all varriables are:")
head(names(UK_NCR),n = 70)
Tip: If you cannot successfully read this dataset, you can replace the above link with “https://raw.githubusercontent.com/Hereislittlemushroom/CASA0005_Final_Assessment/main/Dataset/national-charge-point-registry.csv”
3.2 Data Selection
Select the charge points of london area in this UK csv file。
You can utilise filter
function from dplyr
package to choose the charge point data in london boroughs
= UK_NCR %>%
London_NCR ::filter( !is.na(county),
dplyr== "London" |
county == "Greater London " |
county == "London Borough of Camden" |
county == "London Borough of Ealing" |
county == "London Borough of Greenwich" |
county == "London Borough of Hackney" |
county == "London Borough of Hammersmith and Fulham" |
county == "London Borough of Hounslow" |
county == "London Borough of Islington" |
county == "London Borough of Lambeth" |
county == "London Borough of Richmond upon Thames" |
county == "London Borough of Southwark" |
county == "London Borough Of Southwark" |
county == "London Borough of Waltham Forest" |
county == "London Borough of Wandsworth") county
Check if all values in county
are attributed to “London”
= London_NCR$county %>%
isLondon unique()
isLondon
In the next step, you can select the valuable attributes e.g. latitude,longitude.
# Tip: the index of data frame starts from 1
# Select the variables by their index
= London_NCR %>%
London_NCR select(1,4,5,13,14,15,32,35,36,38,54)
# Check the variables we have chosen and the number of rows & cols
%>%
London_NCR names()
%>%
London_NCR nrow()
%>%
London_NCR ncol()
3.3 Data Cleaning
Map and visualisation play important roles in spatial analysis. To make a heat map for further research, you need to merge geographic information for each row in charge point dataset in the first place.
To begin with, import “PostcodesioR.” This R package offer methods to match
# install.packages("PostcodesioR")
library(PostcodesioR)
Before applying “for-loop” method to fill values in GSS_CODE
by identifying postcode
, you can add a new columns called GSS_CODE
in London_NCR dataset.
# Attentions: you can skip this chunk because the for-loop process can take for a quite long time (about 5 min).
# It is not necessary to stick on it, just skip!
= London_NCR %>%
London_NCR_GSS_Added rowwise() %>%
mutate(GSS_CODE = postcode) %>%
# Tip: it is essential to transform numerical data into one in character
mutate(GSS_CODE = as.character(GSS_CODE))
# Pay attention to the for loop in dataframe, it starts from 1
= 1
i for (val in London_NCR_GSS_Added$postcode) {
try({ temp1 = PostcodesioR::postcode_lookup(val)
if(!is.null(temp)){
= temp1$admin_district_code[1]
temp2 $GSS_CODE[i] = temp2
London_NCR_GSS_Addedelse{
}$GSS_CODE[i] = ""
London_NCR_GSS_Added
}= i+1 }
i silent = TRUE)
,
}
# remove the rows whose value of `GSS_CODE` is empty
# There are limitations in this process because the rows missing `GSS_CODE` cannot be included in the dataset, which can slightly affect the research results
$GSS_CODE[London_NCR_GSS_Added$GSS_CODE==""] = NA
London_NCR_GSS_Added= London_NCR_GSS_Added %>%
London_NCR_GSS_Added filter(!is.na(GSS_CODE))
Finally, it is of importance to export our prepossessed data into csv file! Now we get the London_NCR_GSS_Added.csv in our “/Dataset” path.
# export London_NCR_GSS_Added data frame into .CSV format
library(here)
write.csv(London_NCR_GSS_Added, here::here("Dataset","London_NCR_GSS_Added.csv"), row.names = FALSE, col.names = TRUE)
# `col.names = TRUE` is important to be writen down
Also, you can access this prepocessed dataset in github link: https://raw.githubusercontent.com/Hereislittlemushroom/CASA0005_Final_Assessment/main/Dataset/London_NCR_GSS_Added.csv