Section 5 The Distribution analysis of samples
You can generate a formal table to overview the distribution of samples. In order to compare samples between 2019 and 2020, you can obtain year from “dataCreated” which is timestamp format. Then you can create two columns in table (2019 & 2020).
#3.1 select the years
$year = year(df$dateCreated)
dftable(df$year)
##
## 2017 2018 2019 2020 2021
## 1 4 696 362 3
#select 2019 & 2020
= df[year==2019|year==2020,]
df table(df$year)
##
## 2019 2020
## 696 362
#3.2 show the table of the distribution of two years samples
table1(~county|factor(year),data=df)
2019 (N=696) |
2020 (N=362) |
Overall (N=1058) |
|
---|---|---|---|
county | |||
London | 150 (21.6%) | 150 (41.4%) | 300 (28.4%) |
London Borough of Camden | 87 (12.5%) | 179 (49.4%) | 266 (25.1%) |
London Borough of Ealing | 65 (9.3%) | 1 (0.3%) | 66 (6.2%) |
London Borough of Greenwich | 2 (0.3%) | 0 (0%) | 2 (0.2%) |
London Borough of Hackney | 62 (8.9%) | 0 (0%) | 62 (5.9%) |
London Borough of Hammersmith and Fulham | 2 (0.3%) | 3 (0.8%) | 5 (0.5%) |
London Borough of Hounslow | 78 (11.2%) | 0 (0%) | 78 (7.4%) |
London Borough of Islington | 3 (0.4%) | 15 (4.1%) | 18 (1.7%) |
London Borough of Lambeth | 118 (17.0%) | 2 (0.6%) | 120 (11.3%) |
London Borough of Richmond upon Thames | 8 (1.1%) | 9 (2.5%) | 17 (1.6%) |
London Borough of Southwark | 1 (0.1%) | 0 (0%) | 1 (0.1%) |
London Borough Of Southwark | 57 (8.2%) | 0 (0%) | 57 (5.4%) |
London Borough of Waltham Forest | 60 (8.6%) | 2 (0.6%) | 62 (5.9%) |
London Borough of Wandsworth | 3 (0.4%) | 1 (0.3%) | 4 (0.4%) |
# head(df)