Section 5 The Distribution analysis of samples

You can generate a formal table to overview the distribution of samples. In order to compare samples between 2019 and 2020, you can obtain year from “dataCreated” which is timestamp format. Then you can create two columns in table (2019 & 2020).

#3.1 select the years
df$year = year(df$dateCreated)
table(df$year)
## 
## 2017 2018 2019 2020 2021 
##    1    4  696  362    3
#select 2019 & 2020 
df = df[year==2019|year==2020,]
table(df$year)
## 
## 2019 2020 
##  696  362
#3.2 show the table of the distribution of two years samples
table1(~county|factor(year),data=df)
2019
(N=696)
2020
(N=362)
Overall
(N=1058)
county
London 150 (21.6%) 150 (41.4%) 300 (28.4%)
London Borough of Camden 87 (12.5%) 179 (49.4%) 266 (25.1%)
London Borough of Ealing 65 (9.3%) 1 (0.3%) 66 (6.2%)
London Borough of Greenwich 2 (0.3%) 0 (0%) 2 (0.2%)
London Borough of Hackney 62 (8.9%) 0 (0%) 62 (5.9%)
London Borough of Hammersmith and Fulham 2 (0.3%) 3 (0.8%) 5 (0.5%)
London Borough of Hounslow 78 (11.2%) 0 (0%) 78 (7.4%)
London Borough of Islington 3 (0.4%) 15 (4.1%) 18 (1.7%)
London Borough of Lambeth 118 (17.0%) 2 (0.6%) 120 (11.3%)
London Borough of Richmond upon Thames 8 (1.1%) 9 (2.5%) 17 (1.6%)
London Borough of Southwark 1 (0.1%) 0 (0%) 1 (0.1%)
London Borough Of Southwark 57 (8.2%) 0 (0%) 57 (5.4%)
London Borough of Waltham Forest 60 (8.6%) 2 (0.6%) 62 (5.9%)
London Borough of Wandsworth 3 (0.4%) 1 (0.3%) 4 (0.4%)
# head(df)