1 Key Results¶

choropleth map: Spatial Distribution of the number of tech firms cluster in UK Local Authorities
bar chart: Top 10 local authorities with the largest number of technology companies in the UK
table: Possible movers for tech firms

Question: The tech industry contains many industries (i.e. many SICs), but the LQs represent a specific industry index in a specific area (1:1). How to measure tech cluster's LQs in a specific area? (maybe using the average or weighted method?)

The top 10 number of tech firms for TTWA is given below

f1_ttwa

f2_ttwa

f1_laua

f2

Top 4(the Camden, Westminster, Hackney and Islington clusters) belongs to the London
Only Brighton and Hove, Birmingham, Manchester and Leeds do not belong to London

possible_tech_movers

The number of movers are small (64 records for tech firms)
I plan to draw the direction of their movement
I plan to explore asset,incorporation_date or other columns to find other characters of movers

2 Reproducible Exploratory Data Analysis(EDA)¶

# set the local word path 
import sys
PATH = sys.path[0]

2.1 Read Data¶

# read the tech firm records in fame_OC (15GB)
import pandas as pd
df = pd.read_csv(PATH + "/Dataset/fame_OC_tech_firm.csv",low_memory=True)

# preprocess the data
df = df.drop({"Unnamed: 0","Unnamed: 0.1","Unnamed: 0.1.1","is_tech"},axis = 1)
df.head()

/Users/fangzeqiang/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py:3058: DtypeWarning: Columns (3,28,41,66) have mixed types.Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

# read the OC1-3
import pandas as pd
df_OC_1_1 = pd.read_csv(PATH + "/Dataset/OC_1_1.csv",low_memory=True)
# df_OC_1_2 = pd.read_csv(PATH + "/Dataset/OC_2_1.csv",low_memory=True)
# df_OC_1_3 = pd.read_csv(PATH + "/Dataset/OC_3_1.csv",low_memory=True)

# proccess the OC_1_1 data
df_OC_1_1 = df_OC_1_1.drop({"Unnamed: 0"},axis=1)
df_OC_1_1.head()

/Users/fangzeqiang/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py:3058: DtypeWarning: Columns (1) have mixed types.Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

2.2 Missing Value Detect¶

# find the tech_firm_fame_OC missing value
total = df.isnull().sum().sort_values(ascending=False)
percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data.style

detect the OC_1_1 missing value and clean

# find the OC1-3 missing value
total = df_OC_1_1.isnull().sum().sort_values(ascending=False)
percent = (df_OC_1_1.isnull().sum()/df_OC_1_1.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
missing_data

# detect the unique value for country columns
df_OC_1_1.country.unique()

array(['United Kingdom', nan, 'Bulgaria', 'Israel',
       'United States of America', 'Jersey', 'Isle Of Man', 'Guernsey'],
      dtype=object)

# drop the columns which contain NA value

df_OC_1_1_dropna = df_OC_1_1.copy()
print(df_OC_1_1_dropna.shape)

for i in df_OC_1_1_dropna.columns:
    # i = str(i)
    if missing_data.Percent[i]!=0:
        df_OC_1_1_dropna = df_OC_1_1_dropna.drop({i},axis = 1)
    # df_OC_1_1 = df_OC_1_1.drop()

print(df_OC_1_1_dropna.shape)

# and we can find there are 6 columns remaining

(259581, 30)

(259581, 6)

2.3 Attributes Recap (fame_OC)¶

For tech_firm_fame_OC, the temporal variables are given below:

Columns	Missing Number	Missing Percentage
dissolution_date	512706	0.998049
incorporation_date	88803	0.172867

* 17% incorporation_date are N/A

* Almost 99% dissolution_date are N/A (most tech firms survive?)

And the spatial attributes can be found in the following table:

Columns	Description
Postcode
Pcd2	postcode split by space
Pcds
Cty	City
Laua	Local/Unitary Authority/London Borough
Ctry	Country
Rgn	Region
Ttwa	Travel to work area
Pct

* 4 countries in this dataset

* 12 regions in dataset

2.4 TTWA EDA (fame_OC)¶

TTWA means "travel to work area"

ref: https://www.ncl.ac.uk/media/wwwnclacuk/curds/files/RR2015-05.pdf

The first 10 TTWAs in the UK can be presented below:

top_10_ttwa = df.ttwa.value_counts().head(10)
top_10_ttwa = pd.DataFrame(top_10_ttwa).reset_index()
top_10_ttwa.columns = ["ttwa_code","counts"]

df_ttwa = pd.read_csv(PATH + "/Dataset/ttwa.csv")
df_ttwa = pd.merge(top_10_ttwa, df_ttwa[["code","name"]], left_on="ttwa_code",right_on="code")
df_ttwa_10 = df_ttwa[["ttwa_code", "counts", "name"]]
df_ttwa_10.style

Compared to the Largest TTWAs by workplace population, UK, 2011 TTWAs

WX20210712-203201@2x.png

But it is hard to find the ttwa boundary to do visulisation work. I might consider Laua attributes

This TTWA EDA value is:

the top 10 tech clusters can be identified like London, Manchester (more details can be found in the Travel to Work Areas (Mike Coombes and Office for National Statistics, 2015))
Its population of job can be found which is conducive to the LQs detecting

The difficulties:

The boundaries dataset is in theOpen Geography portalx, which is hard to get spatial information in Python because its type is JSON
The boundaries and work population number is not up-to-date and available in 2021. (I cannot find open data for ttwa in 2020)

2.4.1 Ttwa spatial info detect¶

Data source: https://geoportal.statistics.gov.uk/datasets/travel-to-work-areas-december-2011-generalised-clipped-boundaries-in-united-kingdom-2

import geopandas as gpd
g_ttwa = gpd.read_file(PATH + "/Dataset/Spatial/Travel_to_Work_Areas__December_2011__Generalised_Clipped_Boundaries_in_United_Kingdom.geojson")

g_ttwa.plot()

<AxesSubplot:>

3 Mapping the Tech Cluster (Laua & TTWA)¶

count the frequency of tech firm grouped by Laua & Ttwa
import the spatial dataset local authority boundaries & Travel to work area boundaries
merge with 1. and 2. to visualise the spatial distribution

3.1 Detect the fame_OC data¶

drop the rows which contain NA value in the laua column

print(df.shape)
df1 = df.copy()
df1 = df1.dropna(subset=["laua","ttwa"])
print(df1.shape)

(513708, 64)
(502658, 64)

# import the laua boundaries
df_laua = pd.read_csv(PATH + "/Dataset/Spatial/Local_Authority_2017.csv")
df_laua.head()

# import the ttwa boundaries
import geopandas as gpd
df_ttwa = gpd.read_file(PATH + "/Dataset/Spatial/Travel_to_Work_Areas__December_2011__Generalised_Clipped_Boundaries_in_United_Kingdom.geojson")
df_ttwa.head()

The top 10 Local Authority can be found in the following list

# df_laua = pd.read_csv(PATH + "/Dataset/ttwa.csv")
df_laua_10 = pd.DataFrame(df1.laua.value_counts()[:12]).reset_index()
df_laua_10.columns = ["laua_code","count"]

df_laua_10 = pd.merge(df_laua_10, df_laua[["lad17cd","lad17nm"]], left_on="laua_code",right_on="lad17cd")

df_laua_10 = df_laua_10[["laua_code","count","lad17nm"]]
df_laua_10.columns = ["laua_code","count","name"]
df_laua_10.style

Top 4(the Camden, Westminster, Hackney and Islington clusters) belongs to the London
Only Brighton and Hove, Birmingham, Manchester and Leeds do not belong to London

The top 10 Travel to Work Area can be found in the following list

top_10_ttwa = df.ttwa.value_counts().head(10)
top_10_ttwa = pd.DataFrame(top_10_ttwa).reset_index()
top_10_ttwa.columns = ["ttwa_code","counts"]

df_ttwa = pd.read_csv(PATH + "/Dataset/ttwa.csv")
df_ttwa = pd.merge(top_10_ttwa, df_ttwa[["code","name"]], left_on="ttwa_code",right_on="code")
df_ttwa_10 = df_ttwa[["ttwa_code", "counts", "name"]]
df_ttwa_10.style

3.2 Import the Laua & TTWA spatial info¶

This file is Local_Authority_2019_Boundaries/Local_Authority_Districts_(December_2019)_Boundaries_UK_BFC.shp

# read the shape file and plot map

import geopandas as gpd
g_laua = gpd.read_file(PATH + "/Dataset/Spatial/Local_Authority_2019_Boundaries/Local_Authority_Districts_(December_2019)_Boundaries_UK_BFC.shp")
g_ttwa = gpd.read_file(PATH + "/Dataset/Spatial/Travel_to_Work_Areas__December_2011__Generalised_Clipped_Boundaries_in_United_Kingdom.geojson")
# print(g_laua)

g_laua.plot()
g_ttwa.plot()

<AxesSubplot:>

3.3 Merge the spatial and info¶

get the laua and frequency list and transform into the frequency

df_laua_freq = pd.DataFrame(df1.laua.value_counts()).reset_index()
df_laua_freq.columns = ["laua_code","count"]
df_laua_freq.head()

get the ttwa and frequency list and transform into the frequency

df_ttwa_freq = pd.DataFrame(df1.ttwa.value_counts()).reset_index()
df_ttwa_freq.columns = ["ttwa_code","count"]
df_ttwa_freq.head()

merge the df_laua_freq and g_laua
merge the df_ttwa_freq and g_ttwa

df_laua_mg = pd.merge(g_laua, df_laua_freq,left_on='lad19cd',right_on='laua_code',how='inner').reset_index()
df_ttwa_mg = pd.merge(g_ttwa, df_ttwa_freq, left_on='TTWA11CD', right_on='ttwa_code',how='inner').reset_index()
df_laua_mg.plot()
df_ttwa_mg.plot()

<AxesSubplot:>

cmap settings ref: https://matplotlib.org/stable/tutorials/colors/colormaps.html

3.4 Mapping the Freuqency¶

import pysal as ps
import numpy as np
import pandas as pd
import seaborn as sns
import geopandas as gpd
import matplotlib.pyplot as plt

tfont = {'fontname':'DejaVu Sans', 'horizontalalignment':'left'}
f1_laua = plt.figure()
f1_laua.set_size_inches(20,20)

ax1 = f1_laua.add_subplot()
df_laua_mg.plot(column='count', legend=True, cmap='YlGn', ax=ax1)
# df_laua_mg.plot(column='count', legend=True, cmap='plasma',figsize=(180,60),ax=ax1)

f1_laua.subplots_adjust(top=0.95)
# f1.suptitle(f"Frequency Distribution of tech firms cluster by Local Authorities", x=0.025, size=24, **tfont)
f1_laua.suptitle(f"Spatial Distribution of the number of tech firms cluster in UK Local Authorities", x=0.25, size=24, **tfont)
f1_laua.savefig(PATH + '/Img/Frequency_Distribution_of_tech_firms_cluster_by_LAUA.png', dpi=300)

df_ttwa_mg

# map the ttwa and frequency of tech clusters
tfont = {'fontname':'DejaVu Sans', 'horizontalalignment':'left'}
f1_ttwa = plt.figure()
f1_ttwa.set_size_inches(20,20)

ax1 = f1_ttwa.add_subplot()
df_ttwa_mg.plot(column='count', legend=True, cmap='YlGn', ax=ax1)
# df_laua_mg.plot(column='count', legend=True, cmap='plasma',figsize=(180,60),ax=ax1)

f1_ttwa.subplots_adjust(top=0.95)
# f1.suptitle(f"Frequency Distribution of tech firms cluster by Local Authorities", x=0.025, size=24, **tfont)
f1_ttwa.suptitle(f"Spatial Distribution of the number of tech firms cluster in UK Travel to Work Area", x=0.25, size=24, **tfont)
f1_ttwa.savefig(PATH+'/Img/Frequency_Distribution_of_tech_firms_cluster_by_TTWA.png', dpi=300)

For create a high contrast, making the TTWA (less than top 10) count 0.

We need df_ttwa_10 & df_ttwa_mg to make a better visualisation version

df_ttwa_10.head()

df_ttwa_mg.head()

df_ttwa_mg_for_viz = df_ttwa_mg.copy()
df_ttwa_mg_for_viz.loc[~df_ttwa_mg_for_viz["TTWA11CD"].isin(df_ttwa_10.ttwa_code.to_list()),"count"]=-100000
df_ttwa_mg_for_viz

note: the loc is a "inplace = True" method, you dont need to write df = df.loc[] just df.loc[] = ..

# map the ttwa and frequency of tech clusters again!!
tfont = {'fontname':'DejaVu Sans', 'horizontalalignment':'left'}
f1_ttwa = plt.figure()
f1_ttwa.set_size_inches(20,20)

ax1 = f1_ttwa.add_subplot()
cmap_set = "tab20b"
df_ttwa_mg_for_viz.plot(column='count', legend=True, cmap=cmap_set, ax=ax1)
# coolwarm
# winter
# tab20b
# tab20c
# df_laua_mg.plot(column='count', legend=True, cmap='plasma',figsize=(180,60),ax=ax1)

f1_ttwa.subplots_adjust(top=0.95)
# f1.suptitle(f"Frequency Distribution of tech firms cluster by Local Authorities", x=0.025, size=24, **tfont)
f1_ttwa.suptitle(f"Spatial Distribution of the number of tech firms cluster in UK Travel to Work Area", x=0.25, size=24, **tfont)
f1_ttwa.savefig(PATH+'/Img/Frequency_Distribution_of_tech_firms_cluster_by_TTWA' + '_' + cmap_set + '.png', dpi=300)

3.5 Plot the Bar Chart¶

df_laua_10.style

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")

# Initialize the matplotlib figure
f2_laua, ax2 = plt.subplots(figsize=(15,6))

# Load the example car crash dataset
# crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False)

# Plot the total crashes
sns.set_color_codes("pastel")
sns.barplot(x="count", y="name", data=df_laua_10, color="b")

# Add a legend and informative axis label
# ax.legend(ncol=2, loc="lower right", frameon=True)
ax2.set(xlim=(0, 16050), ylabel="", xlabel="the number of technology companies")
ax2.set_title("Top 10 local authorities with the largest number of technology companies in the UK",fontsize = 24)
# sns.despine(left=True, bottom=True)
f2_laua.savefig(PATH + "/Img/The_Top_10_Number_of_Tech_Firms_laua.png")

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")

# Initialize the matplotlib figure
f2_ttwa, ax2 = plt.subplots(figsize=(15,6))

# Load the example car crash dataset
# crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False)

# Plot the total crashes
sns.set_color_codes("pastel")
sns.barplot(x="counts", y="name", data=df_ttwa_10, color="b")

# Add a legend and informative axis label
# ax.legend(ncol=2, loc="lower right", frameon=True)
ax2.set(xlim=(0, 150000), ylabel="", xlabel="the number of technology companies")
ax2.set_title("Top 10 Travel to Work Area with the largest number of technology companies in the UK",fontsize = 24)
# sns.despine(left=True, bottom=True)
f2_ttwa.savefig(PATH + "/Img/The_Top_10_Number_of_Tech_Firms_ttwa.png")

4 Explore the Movers¶

For fame_OC

keep the same name before and after movement
change the name before and after movement

df2 = df.copy()
firm_name_freq = pd.DataFrame(df2.normalised_name.value_counts()).reset_index()
firm_name_freq.columns = ["firm_name","count"]

# firm's name re-appears
firm_name_re = firm_name_freq[firm_name_freq["count"]>1]
firm_name_re.style

In this case, I can detect the 41 companies that might move

Then back to the fame_OC and filter these possible movers

df_fame_movers = df.copy()
df_fame_movers = df_fame_movers[df_fame_movers.normalised_name.isin(firm_name_re.firm_name.to_list())]
# df_fame_movers.to_csv(PATH + "/df_fame_OC_movers.csv")

# detect which one has different postcode in the same normalised name
df_movers = pd.DataFrame(df_fame_movers.groupby("normalised_name")["postcode"].nunique()).reset_index()
# more than 2 distinct postcodes mean these tech firms may move
df_movers = df_movers[df_movers["postcode"]>=2]

# export a list that tech firm have different postcode while they have the same names
df_movers_1 = df_fame_movers[df_fame_movers["normalised_name"].isin(df_movers["normalised_name"].to_list())]
possible_tech_movers = df_movers_1[["registered_number","incorporation_date","dissolution_date","normalised_name","postcode"]].sort_values(by="normalised_name")
possible_tech_movers

	registered_number	incorporation_date	dissolution_date	normalised_name	postcode
164178	3060979	1995-05-24	NaN	agility limited	EN49HN
144252	07418654	2010-10-25	NaN	agility limited	EC1V2NX
279459	11632107	2018-10-19	NaN	amd tech limited	N156RH
182708	06250619	2007-05-17	NaN	amd tech limited	GL53PN
228143	08424152	2013-02-28	NaN	conversion company limited	KT70RY
512012	1981640	1986-01-23	NaN	conversion company limited	YO211QL
157297	SC583384	2017-12-07	NaN	create limited	EH107DU
285768	7460781	2010-12-06	NaN	create limited	N17GU
20379	11059998	2017-11-13	NaN	ctrl z limited	BA113EE
469818	05608171	2005-11-01	NaN	ctrl z limited	W1T6AD
213430	11634444	2018-10-22	NaN	entertainment international limited	IG110GH
323343	11143790	2018-01-10	NaN	entertainment international limited	NW51EN
12019	8070874	2012-05-16	NaN	eye clinic limited	B288EZ
211689	06219671	2007-04-19	NaN	eye clinic limited	HU53TS
140510	7744742	2011-08-18	NaN	future corporation limited	E83DL
39534	5631136	2005-11-21	NaN	future corporation limited	BL26RT
271030	04397427	2002-03-18	NaN	globex corporation limited	MK428TU
19904	09552070	2015-04-21	NaN	globex corporation limited	B976HR
387395	11065921	2017-11-15	NaN	grace and grace limited	SE64EJ
241453	11412829	2018-06-13	NaN	grace and grace limited	LL116UG
47123	8546782	2013-05-28	NaN	hit reset limited	BN16AF
224545	6598405	2008-05-20	NaN	hit reset limited	KT168PD
242014	3937810	2000-03-01	NaN	hive limited	LL181RF
511527	10797308	2017-06-01	NaN	hive limited	YO126HT
291684	10589202	2017-01-30	NaN	jc film limited	N32JU
143676	11613438	2018-10-09	NaN	jc film limited	EC1R0DU
424940	9549356	2015-04-19	NaN	jd software limited	SW152BF
72656	04405957	2002-03-28	NaN	jd software limited	CB249AD
133614	11306057	2018-04-12	NaN	limited	E162DQ
326148	10219186	2016-06-07	NaN	limited	NW99NL
8204	7998847	2012-03-20	NaN	mak corporation limited	B112AA
39799	10674075	2017-03-16	NaN	mak corporation limited	BL35AZ
323418	9252531	2014-10-07	NaN	mind music limited	NW51UA
153204	09331890	2014-11-28	NaN	mind music limited	EC2R6AY
192884	9989675	2016-02-05	NaN	quality care solutions limited	HA12SP
407150	7745467	2011-08-18	NaN	quality care solutions limited	SN33AJ
33981	11438196	2018-06-28	NaN	red red limited	BH149NY
215822	11233132	2018-03-05	NaN	red red limited	IG45HQ
332492	6329371	2007-07-31	NaN	sar uk limited	OX46NG
268281	8922191	2014-03-04	NaN	sar uk limited	MK11SW
180039	05549948	2005-08-31	NaN	solutions design limited	GL170PZ
494527	11567756	2018-09-13	NaN	solutions design limited	WC2H9JQ
368604	8417509	2013-02-25	NaN	sri corporation limited	S117AQ
398016	8817694	2013-12-17	NaN	sri corporation limited	SL11UA
151594	11487977	2018-07-27	NaN	we unicorns limited	EC2A4NE
491599	9732240	2015-08-13	NaN	we unicorns limited	WC2H7LA

	registered_number	incorporation_date	dissolution_date	latest_accounts_cash	latest_accounts_assets	latest_accounts_liabilities	year_obs	sic_year	sic4	change_sic	...	pcd2	pcds	cty	laua	ctry	rgn	ttwa	pct	lep1	lep2
0	06199872	2007-04-02	NaN	NaN	15785.0	NaN	2017	2007.0	8622.0	0.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	11896710	NaN	NaN	NaN	NaN	NaN	2018	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	SC041753	1965-02-19	NaN	NaN	NaN	NaN	2018	2007.0	2013.0	0.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	12125173	NaN	NaN	NaN	NaN	NaN	2018	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	11968595	NaN	NaN	NaN	NaN	NaN	2018	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	registered_number	bvd_id	id	registered_addresspostal_code	primaryaddresspostcodeifuk	alltradingaddressespostcodeifuk	sic4	birth_year	diss_year	streetnobuildingetcline1	...	country	countryisocode	regionincountry	typeofregionincountry	telephonenumber	faxnumber	addresstype	address_group	_merge
0	118	NaN	233927.0	TN23 1DA	NaN	NaN	4611.0	1856.0	NaN	The New Ashford Market	...	United Kingdom	GB	England\|South Eastern\|Tonbridge (TN)\|Ashford	Country\|Region\|Postal area\|Town	01233506201	NaN	Trading address	1.0	matched (3)
1	258	NaN	12022629.0	RG12 1AN	NaN	NaN	7499.0	1856.0	NaN	India Buildings	...	United Kingdom	GB	England\|North West\|Liverpool (L)\|Liverpool	Country\|Region\|Postal area\|Town	NaN	NaN	Trading address	1.0	matched (3)
2	371	NaN	2111477.0	SW1Y 6BN	NaN	NaN	7499.0	1863.0	NaN	St Clements House	...	United Kingdom	GB	England\|Eastern\|Norwich (NR)\|Norwich	Country\|Region\|Postal area\|Town	NaN	NaN	Trading address	1.0	matched (3)
3	402	NaN	2302906.0	PL4 0RA	NaN	NaN	7499.0	1863.0	NaN	Millbay Road	...	United Kingdom	GB	England\|South Western\|Plymouth (PL)\|Plymouth	Country\|Region\|Postal area\|Town	01752275850	NaN	Trading address	1.0	matched (3)
4	425	NaN	3310625.0	EC4Y 8BB	NaN	NaN	NaN	1856.0	2014.0	Surrey House	...	United Kingdom	GB	England\|London Outer\|Kingston Upon Thames (KT)...	Country\|Region\|Postal area\|Town	NaN	NaN	Trading address	1.0	matched (3)

	Total	Percent
dissolution_date	512706	0.998049
diss_year	512706	0.998049
companieshousenumbersofchildcomp	512703	0.998044
incorporationdate	508225	0.989327
alltradingaddressespostcodeifuk	508107	0.989097
primaryaddresspostcodeifuk	508104	0.989091
numberofchildcompanies	508104	0.989091
companieshousenumbersofcompanies	508104	0.989091
numberofcompaniesinitscorporates	508104	0.989091
registeredname	508104	0.989091
companieshousestatus	508104	0.989091
siccodes2007	508104	0.989091
istheultimateparent	508104	0.989091
latest_accounts_cash	493925	0.961490
latest_accounts_liabilities	463134	0.901551
lep2	450192	0.876358
nr_employees_last_avail_yr	349899	0.681124
latest_accounts_assets	255560	0.497481
year_accounts	224805	0.437612
keep	95745	0.186380
string	95745	0.186380
sic4	89079	0.173404
sic_year	89079	0.173404
age	88803	0.172867
incorporation_date	88803	0.172867
birth_year	88803	0.172867
id	88802	0.172865
name	88802	0.172865
normalised_name	88802	0.172865
company_type	88802	0.172865
change_sic	88802	0.172865
jurisdiction_code	88802	0.172865
max_anna	88802	0.172865
oc	88536	0.172347
bh_notmerged	88536	0.172347
bh_dummy	88536	0.172347
merge_max_anna	88536	0.172347
duo_direct_share	17374	0.033821
pct	11050	0.021510
laua	11050	0.021510
cty	11050	0.021510
lep1	11050	0.021510
rgn	11050	0.021510
ctry	11050	0.021510
ttwa	11050	0.021510
pcd2	11036	0.021483
pcds	11036	0.021483
postcode	4333	0.008435
registered_addresspostal_code	4333	0.008435
ro_full_postcode	3476	0.006766
sample	266	0.000518
ro_country	1	0.000002
year_obs	0	0.000000
guo_name	0	0.000000
guo_bvd_id	0	0.000000
fame	0	0.000000
merge_fame	0	0.000000
duo_name	0	0.000000
duo_bvd_id	0	0.000000
company_name	0	0.000000
bvd_id	0	0.000000
primary_uk_sic_2007	0	0.000000
guo_direct_share	0	0.000000
registered_number	0	0.000000

	Total	Percent
faxnumber	259581	1.000000
streetnobuildingetcline4	258103	0.994306
streetnobuildingetcline4native	258103	0.994306
alltradingaddressespostcodeifuk	257100	0.990442
primaryaddresspostcodeifuk	257100	0.990442
streetnobuildingetcline3native	219300	0.844823
streetnobuildingetcline3	219300	0.844823
diss_year	190576	0.734168
bvd_id	176827	0.681202
telephonenumber	154869	0.596611
streetnobuildingetcline2	112826	0.434647
streetnobuildingetcline2native	112826	0.434647
sic4	45985	0.177151
birth_year	4844	0.018661
id	4812	0.018538
regionincountry	1174	0.004523
typeofregionincountry	1174	0.004523
country	1160	0.004469
countryisocode	1160	0.004469
city	1157	0.004457
citynative	1157	0.004457
streetnobuildingetcline1native	128	0.000493
streetnobuildingetcline1	128	0.000493
registered_addresspostal_code	11	0.000042
_merge	0	0.000000
isdup_t_postcode	0	0.000000
postcode	0	0.000000
addresstype	0	0.000000
address_group	0	0.000000
registered_number	0	0.000000

	ttwa_code	counts	name
0	E30000234	146959	London
1	E30000239	21683	Manchester
2	E30000266	20224	Slough and Heathrow
3	E30000169	11262	Birmingham
4	E30000237	10408	Luton
5	E30000175	9232	Bournemouth
6	E30000179	8368	Brighton
7	E30000186	7296	Cambridge
8	E30000180	7090	Bristol
9	E30000284	6968	Warrington and Wigan

	objectid	lad17cd	lad17nm	bng_e	bng_n	long	lat	st_areashape	st_lengthshape
0	1	E06000001	Hartlepool	447157	531476	-1.27023	54.676159	9.355951e+07	71707.407523
1	2	E06000002	Middlesbrough	451141	516887	-1.21099	54.544670	5.388858e+07	43840.866561
2	3	E06000003	Redcar and Cleveland	464359	519597	-1.00611	54.567520	2.448203e+08	97993.391012
3	4	E06000004	Stockton-on-Tees	444937	518183	-1.30669	54.556911	2.049622e+08	119581.595543
4	5	E06000005	Darlington	428029	515649	-1.56835	54.535351	1.974757e+08	107206.401694

	OBJECTID	TTWA11CD	TTWA11NM	Shape__Area	Shape__Length	geometry
0	1	E30000004	Barnsley	3.424992e+08	129308.319110	POLYGON ((-1.44681 53.61282, -1.44475 53.61061...
1	2	E30000018	Bradford	3.438902e+08	119605.015955	POLYGON ((-1.94399 53.95619, -1.94161 53.95252...
2	3	E30000029	Halifax	3.639425e+08	105182.259501	POLYGON ((-2.03257 53.81023, -2.03138 53.80902...
3	4	E30000039	Skipton	1.178893e+09	197077.350855	POLYGON ((-2.30733 54.25008, -2.30511 54.24934...
4	5	E30000046	Dorchester and Weymouth	7.200201e+08	226091.682220	MULTIPOLYGON (((-2.22501 50.61646, -2.22520 50...

	laua_code	count	name
0	E09000007	15897	Camden
1	E09000033	12069	Westminster
2	E09000012	12036	Hackney
3	E09000019	10271	Islington
4	E06000043	8026	Brighton and Hove
5	E08000025	7356	Birmingham
6	E09000003	7023	Barnet
7	E08000003	6254	Manchester
8	E08000035	5899	Leeds
9	E09000001	5576	City of London

	firm_name	count
0	plexus holdings plc	4
1	rhythmone plc	4
2	mcguirk industries limited	2
3	create limited	2
4	solutions design limited	2
5	synthesizedesign limited	2
6	multiconnect limited	2
7	ja richardson engineering limited	2
8	entertainment international limited	2
9	globex corporation limited	2
10	bloody surgery limited	2
11	dwm media limited	2
12	jc film limited	2
13	sar uk limited	2
14	kcom group plc	2
15	jkl tribe limited	2
16	bucks cbt plus limited	2
17	conversion company limited	2
18	proact it uk limited	2
19	access it resource limited	2
20	ready player one limited	2
21	nmr geosciences limited	2
22	grace and grace limited	2
23	amd tech limited	2
24	red red limited	2
25	we unicorns limited	2
26	quality care solutions limited	2
27	future corporation limited	2
28	ctrl z limited	2
29	gopher it support limited	2
30	jd software limited	2
31	cliste consulting limited	2
32	sri corporation limited	2
33	eye clinic limited	2
34	hit reset limited	2
35	agility limited	2
36	acm transport limited	2
37	mak corporation limited	2
38	hive limited	2
39	limited	2
40	mind music limited	2
41	anlex8 limited	2