Dynamics Research for Tech Cluster in UK

Question

  1. For entry pattern, how to pick incorperation year? (Like 1950-2018)

    Looking up for literature to find supporting evidence? (Like Apple were found in 1970s)?

  2. How to do map making for entry/exit/churn pattern?

    churn pattern may be similar to the entry pattern because dissolution date is a small number in dataset(fame_OC)

  3. How to define the number of all companies in entry rate measurement?

    Does it represent the number of tech companies in a specific cluster from 19xx to 2018?

  4. Which regression method I should pick up (multiple regression?)

    D/ Regression analysis - I would suggest trying two things to start. Again, for cluster i, year t:

    $Dynamics_variable_it = F(#firms_it-n, industry mix_it-n, X_it-n, location_i, year_t)$ (1)

    $Firm_performance_it = F(#firms_it-n, dynamics_it-n industry mix_it-n, X_it-n, location_i, year_t)$ (2)

    Attributes Question:

    • firms_it-n : the number of tech firms in year $t$ and cluster $i$
    • industry mix_it-n : ?
    • X_it-n: ?
    • location_i: tech cluster $i$
    • year_t: tech firm incorperation year $t$

1 Key Results

1.1 Tech Cluster Identifying

According to the previous EDA: https://zeqiang.fun/CASA0012-Dissertation/jupyterlab/EDA.html

The top 10 tech cluster can be divided by TTWA (Travel to Work Area). You can see the bar chart below

The key finding of bar chart:

  1. The top 1 tech cluster has the most number of tech firms, which is as multiple times as that of other clusters.
  2. The statistics of Manchester and Slough&Heathrow are greated than 20,000

The_Top_10_Number_of_Tech_Firms_ttwa.png

The map of top 10 tech clusters based on ttwa boundaries can be found below. The blue area are non-top-10-tech-clusters.

The key finding of map:

  1. Most tech clusters are located in the south of UK (England & Wales)
  2. The number of tech firms sorrounding London tech cluster is the most (the pink and dark brown area)

Frequency_Distribution_of_tech_firms_cluster_by_TTWA_tab20b.png

1.2 Entry Pattern

The dataset of entry rate for 10 tech clusters from 1900 to 2018 can be found below

In [66]:
entry_rate.style
Out[66]:
tech_cluster all_companies 1900 1901 1902 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 ttwa_code name
0 E30000169 9592 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000104 0.000104 0.000209 0.000104 0.000104 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000104 0.000000 0.000000 0.000104 0.000000 0.000000 0.000104 0.000000 0.000209 0.000000 0.000104 0.000000 0.000000 0.000000 0.000209 0.000000 0.000000 0.000209 0.000209 0.000000 0.000104 0.000000 0.000000 0.000104 0.000000 0.000000 0.000000 0.000104 0.000313 0.000000 0.000104 0.000104 0.000104 0.000000 0.000209 0.000104 0.000209 0.000209 0.000000 0.000313 0.000104 0.000104 0.000521 0.000209 0.000209 0.000313 0.000417 0.000209 0.000521 0.000521 0.000000 0.000104 0.000521 0.000209 0.000417 0.000834 0.000521 0.000938 0.001251 0.000626 0.000834 0.000834 0.001460 0.000834 0.001251 0.002919 0.002815 0.002606 0.002085 0.001668 0.002606 0.002085 0.002711 0.003440 0.004691 0.004483 0.007089 0.005942 0.008132 0.008549 0.006985 0.013032 0.015638 0.015117 0.015742 0.020746 0.028148 0.027314 0.027314 0.039304 0.040450 0.057339 0.074750 0.090179 0.105088 0.094245 0.098415 0.150751 E30000169 Birmingham
1 E30000175 7546 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000133 0.000000 0.000000 0.000133 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000133 0.000133 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000133 0.000000 0.000000 0.000000 0.000000 0.000000 0.000133 0.000000 0.000000 0.000000 0.000000 0.000000 0.000133 0.000000 0.000000 0.000000 0.000265 0.000000 0.000133 0.000133 0.000000 0.000265 0.000000 0.000000 0.000398 0.000265 0.000398 0.000000 0.000133 0.000000 0.000133 0.000133 0.000265 0.000265 0.000398 0.000265 0.000265 0.000530 0.000530 0.000795 0.000795 0.001060 0.000928 0.001723 0.000530 0.001723 0.000928 0.002120 0.001458 0.002385 0.003446 0.003048 0.002915 0.004506 0.003843 0.005168 0.006361 0.005168 0.006626 0.007156 0.022131 0.010999 0.017493 0.032202 0.041479 0.049960 0.049033 0.076067 0.122184 0.137291 0.161675 0.211105 E30000175 Bournemouth
2 E30000179 7482 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000134 0.000000 0.000000 0.000000 0.000134 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000267 0.000000 0.000000 0.000000 0.000134 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000134 0.000000 0.000000 0.000000 0.000134 0.000000 0.000000 0.000134 0.000000 0.000000 0.000134 0.000000 0.000000 0.000000 0.000000 0.000134 0.000000 0.000000 0.000134 0.000267 0.000267 0.000267 0.000000 0.000134 0.000802 0.000802 0.000401 0.000668 0.000134 0.001203 0.000401 0.000936 0.001604 0.001069 0.002138 0.001604 0.003074 0.003475 0.005213 0.004945 0.006148 0.004812 0.009489 0.012430 0.008688 0.011895 0.016306 0.020315 0.024058 0.022053 0.041834 0.056001 0.063085 0.093692 0.107725 0.130179 0.120556 0.102646 0.117215 E30000179 Brighton
3 E30000180 6176 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000162 0.000000 0.000000 0.000000 0.000162 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000162 0.000000 0.000000 0.000000 0.000000 0.000000 0.000162 0.000162 0.000000 0.000000 0.000162 0.000486 0.000000 0.000162 0.000000 0.000486 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000162 0.000162 0.000000 0.000324 0.000324 0.000162 0.000162 0.000000 0.000162 0.000000 0.000324 0.000324 0.000162 0.000162 0.000486 0.000648 0.000648 0.000162 0.001133 0.000648 0.000162 0.000648 0.001133 0.001133 0.001133 0.001457 0.001781 0.002915 0.002915 0.002267 0.003238 0.002105 0.002753 0.003724 0.005829 0.006639 0.009877 0.010687 0.010363 0.011334 0.010525 0.015220 0.019268 0.016839 0.017487 0.022992 0.039184 0.028983 0.027850 0.038860 0.051975 0.065576 0.069462 0.088407 0.093588 0.092455 0.089216 0.121762 E30000180 Bristol
4 E30000186 6251 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000160 0.000000 0.000160 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000160 0.000000 0.000000 0.000000 0.000160 0.000000 0.000160 0.000000 0.000000 0.000160 0.000160 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000160 0.000000 0.000000 0.000000 0.000160 0.000320 0.000160 0.000160 0.000000 0.000000 0.000160 0.000000 0.000000 0.000160 0.000160 0.000000 0.000160 0.000320 0.000320 0.000320 0.000000 0.000320 0.000160 0.000320 0.000480 0.000160 0.000320 0.000480 0.000640 0.000960 0.000480 0.000800 0.000640 0.000640 0.000800 0.000800 0.001120 0.000640 0.000640 0.001440 0.001280 0.001760 0.001920 0.002720 0.002560 0.003519 0.002240 0.004159 0.003679 0.003199 0.004159 0.002400 0.004959 0.005919 0.008479 0.007839 0.011198 0.010078 0.014078 0.011358 0.020317 0.023356 0.018397 0.019037 0.020957 0.029755 0.025596 0.026716 0.033275 0.041433 0.049912 0.059510 0.065270 0.080947 0.087186 0.099824 0.175492 E30000186 Cambridge
5 E30000234 118998 0.000017 0.000008 0.000000 0.000008 0.000008 0.000008 0.000034 0.000000 0.000017 0.000025 0.000000 0.000000 0.000017 0.000025 0.000025 0.000025 0.000025 0.000008 0.000008 0.000017 0.000017 0.000025 0.000017 0.000042 0.000025 0.000008 0.000008 0.000059 0.000034 0.000076 0.000034 0.000008 0.000034 0.000025 0.000042 0.000059 0.000017 0.000034 0.000017 0.000017 0.000008 0.000017 0.000017 0.000042 0.000025 0.000109 0.000101 0.000050 0.000050 0.000059 0.000025 0.000034 0.000084 0.000092 0.000050 0.000059 0.000059 0.000092 0.000134 0.000168 0.000118 0.000109 0.000143 0.000134 0.000143 0.000193 0.000235 0.000151 0.000235 0.000269 0.000336 0.000336 0.000353 0.000345 0.000345 0.000420 0.000563 0.000563 0.000714 0.000647 0.000798 0.000740 0.001084 0.001185 0.001571 0.001269 0.001714 0.001824 0.001597 0.001765 0.001866 0.002462 0.002580 0.003000 0.004143 0.005051 0.005790 0.006479 0.007437 0.008899 0.007866 0.010185 0.013689 0.012311 0.013799 0.017261 0.022311 0.021614 0.025017 0.035404 0.044169 0.055631 0.067833 0.087287 0.098346 0.100153 0.109523 0.187843 E30000234 London
6 E30000237 8869 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000113 0.000000 0.000113 0.000000 0.000113 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000226 0.000000 0.000000 0.000113 0.000000 0.000000 0.000000 0.000000 0.000113 0.000000 0.000000 0.000000 0.000113 0.000113 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000113 0.000000 0.000113 0.000113 0.000000 0.000113 0.000000 0.000000 0.000000 0.000113 0.000000 0.000113 0.000000 0.000000 0.000338 0.000226 0.000113 0.000113 0.000000 0.000113 0.000451 0.000113 0.000226 0.000451 0.000226 0.000677 0.000226 0.000564 0.000789 0.000564 0.000789 0.000789 0.000451 0.000451 0.001015 0.000789 0.000789 0.000902 0.001466 0.001466 0.001240 0.002142 0.001579 0.001917 0.002819 0.002593 0.002706 0.003946 0.004510 0.004510 0.008344 0.008118 0.008569 0.005976 0.006314 0.011050 0.012516 0.012967 0.013643 0.017138 0.029203 0.027061 0.026159 0.040929 0.052092 0.058293 0.067651 0.087383 0.099899 0.087834 0.112189 0.162927 E30000237 Luton
7 E30000239 18270 0.000000 0.000000 0.000055 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000109 0.000109 0.000000 0.000000 0.000000 0.000000 0.000000 0.000055 0.000000 0.000000 0.000055 0.000000 0.000000 0.000000 0.000000 0.000055 0.000000 0.000109 0.000055 0.000164 0.000055 0.000000 0.000055 0.000000 0.000109 0.000000 0.000055 0.000109 0.000055 0.000164 0.000000 0.000109 0.000109 0.000000 0.000000 0.000000 0.000219 0.000055 0.000109 0.000109 0.000109 0.000055 0.000164 0.000109 0.000109 0.000109 0.000109 0.000055 0.000109 0.000328 0.000219 0.000328 0.000274 0.000109 0.000219 0.000109 0.000055 0.000164 0.000164 0.000219 0.000219 0.000328 0.000547 0.000657 0.000328 0.000383 0.000274 0.000602 0.000547 0.001040 0.000493 0.000930 0.001095 0.001204 0.001040 0.001752 0.001970 0.001916 0.002354 0.002901 0.002463 0.003010 0.002737 0.002573 0.004324 0.004926 0.006568 0.007608 0.009086 0.007499 0.008320 0.007663 0.013519 0.017515 0.014559 0.014614 0.020361 0.029338 0.021456 0.029338 0.038259 0.044663 0.055993 0.065189 0.086864 0.099453 0.097209 0.100219 0.158894 E30000239 Manchester
8 E30000266 17364 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000058 0.000058 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000058 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000058 0.000058 0.000058 0.000058 0.000000 0.000058 0.000000 0.000000 0.000000 0.000115 0.000058 0.000115 0.000000 0.000000 0.000000 0.000000 0.000058 0.000058 0.000000 0.000058 0.000058 0.000058 0.000115 0.000230 0.000115 0.000115 0.000058 0.000058 0.000173 0.000058 0.000058 0.000000 0.000173 0.000115 0.000058 0.000173 0.000173 0.000288 0.000230 0.000173 0.000000 0.000230 0.000173 0.000115 0.000346 0.000058 0.000230 0.000115 0.000173 0.000173 0.000173 0.000346 0.000403 0.000461 0.000576 0.000633 0.000864 0.000633 0.000633 0.000979 0.001094 0.001555 0.001325 0.001843 0.002419 0.003225 0.003110 0.002419 0.002764 0.002073 0.003686 0.003801 0.004031 0.006047 0.007141 0.007947 0.007487 0.010539 0.007199 0.012900 0.015665 0.013188 0.018717 0.019638 0.027586 0.025225 0.030811 0.039795 0.049988 0.062946 0.071124 0.093296 0.101014 0.093642 0.095255 0.138793 E30000266 Slough and Heathrow
9 E30000284 4950 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000202 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000202 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000404 0.000000 0.000202 0.000404 0.000000 0.000000 0.000000 0.000000 0.000202 0.000202 0.000202 0.000000 0.000000 0.000606 0.000404 0.000000 0.000404 0.000606 0.000000 0.000000 0.000202 0.000606 0.001212 0.000808 0.000404 0.000808 0.002424 0.000404 0.001010 0.000606 0.001616 0.003030 0.001212 0.001818 0.003232 0.002020 0.002222 0.000808 0.002424 0.002828 0.002424 0.004242 0.007273 0.006869 0.006667 0.007879 0.006061 0.011111 0.015960 0.010101 0.009091 0.015960 0.031919 0.015758 0.019798 0.027071 0.032929 0.035556 0.046061 0.053333 0.064646 0.099798 0.137980 0.297778 E30000284 Warrington and Wigan

The simple linear regression for London tech cluster is operated

In [69]:
regressor_OLS.summary()
Out[69]:
OLS Regression Results
Dep. Variable: y R-squared: 0.253
Model: OLS Adj. R-squared: 0.246
Method: Least Squares F-statistic: 39.27
Date: Tue, 13 Jul 2021 Prob (F-statistic): 6.49e-09
Time: 21:04:24 Log-Likelihood: 280.32
No. Observations: 118 AIC: -556.6
Df Residuals: 116 BIC: -551.1
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -0.7435 0.120 -6.195 0.000 -0.981 -0.506
x1 0.0004 6.12e-05 6.267 0.000 0.000 0.001
Omnibus: 129.234 Durbin-Watson: 0.120
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2345.954
Skew: 3.843 Prob(JB): 0.00
Kurtosis: 23.447 Cond. No. 1.13e+05


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.13e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

As shown above, the regression performance is not good. The exit rate and churn rate will be considered as variables in next multiple regression.

The research plan in the next week:

  1. Multiple regression for entry/exit/churn pattern
  2. EDA for LQs
  3. Lit Review

2 Reproduciable Analysis

2.1 Read Data & Preprocessing

In [9]:
import pandas as pd
import sys
PATH = sys.path[0] # set the local file path

df = pd.read_csv(PATH + "/Dataset/fame_OC_tech_firm.csv")
# preprocess the data
df = df.drop({"Unnamed: 0","Unnamed: 0.1","Unnamed: 0.1.1","is_tech"},axis = 1)
df.head()
/Users/fangzeqiang/Library/Python/3.7/lib/python/site-packages/IPython/core/interactiveshell.py:3058: DtypeWarning: Columns (3,28,41,66) have mixed types.Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
Out[9]:
registered_number incorporation_date dissolution_date latest_accounts_cash latest_accounts_assets latest_accounts_liabilities year_obs sic_year sic4 change_sic ... pcd2 pcds cty laua ctry rgn ttwa pct lep1 lep2
0 06199872 2007-04-02 NaN NaN 15785.0 NaN 2017 2007.0 8622.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 11896710 NaN NaN NaN NaN NaN 2018 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 SC041753 1965-02-19 NaN NaN NaN NaN 2018 2007.0 2013.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 12125173 NaN NaN NaN NaN NaN 2018 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 11968595 NaN NaN NaN NaN NaN 2018 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 64 columns

According to previous EDA research, there are 88803 missing values in birth_year and incorperation_date columns, accounting for about 17% in the dataset.

In [3]:
# drop the null value record in birth year and ttwa(travel to work area) columns
df_1 = df.dropna(subset = {"birth_year","ttwa"})
In [7]:
print("There are "+ str(df_1.shape[0]/df.shape[0] * 100) + "% value remain \n")
There are 82.46591448838639% value remain 

2.2 EDA for Birth Year

In [10]:
# count the value
df_1.birth_year.value_counts()
Out[10]:
2018.0    66154
2017.0    42583
2015.0    40174
2016.0    39836
2014.0    35894
          ...  
1879.0        1
1881.0        1
1885.0        1
1887.0        1
1866.0        1
Name: birth_year, Length: 140, dtype: int64
In [78]:
# Tranform data type
df_birth_year = pd.DataFrame(df_1.birth_year.value_counts()).reset_index()
df_birth_year.rename({"index":"birth_year","birth_year":"count"},axis=1,inplace=True)
df_birth_year[df_birth_year["count"]>10]
Out[78]:
birth_year count
0 2018.0 66154
1 2017.0 42583
2 2015.0 40174
3 2016.0 39836
4 2014.0 35894
... ... ...
89 1925.0 13
90 1942.0 13
91 1924.0 13
92 1923.0 12
93 1909.0 11

94 rows × 2 columns

2.3 Data Preparation: 10 Tech Clusters Identifying

According to the previous EDA: https://zeqiang.fun/CASA0012-Dissertation/jupyterlab/EDA.html

The top 10 tech cluster can be divided by TTWA (Travel to Work Area). You can see the bar chart below

The key finding of bar chart:

  1. The top 1 tech cluster has the most number of tech firms, which is as multiple times as that of other clusters.
  2. The statistics of Manchester and Slough&Heathrow are greated than 20,000

The_Top_10_Number_of_Tech_Firms_ttwa.png

The map of top 10 tech clusters based on ttwa boundaries can be found below. The blue area are non-top-10-tech-clusters.

The key finding of map:

  1. Most tech clusters are located in the south of UK (England & Wales)
  2. The number of tech firms sorrounding London tech cluster is the most (the pink and dark brown area)

Frequency_Distribution_of_tech_firms_cluster_by_TTWA_tab20b.png

2.4 Methodology Design: Measuring Dynamics

  • Entry rate for tech cluster i, year t = firms incorporated_t / all firms_it

    $Entry\ Rate_{i,t} = \frac{incorporating\ firms}{all\ companies}$

    Where $i$ means the tech cluster, $t$ means the tech incorporate year 2009, 2014,..., 2020

  • Exit rate = firms dissolved_t / all firms_it

    $Exit\ Rate_{i,t} = \frac{disolution\ firms}{all\ companies}$

  • Churn rate = (firms incorporated - firms dissolved)_t / all firms_it

    $Churn\ Rate_{i,t} = \frac{incorporation\ firms\ -\ disolution\ firms}{all\ companies}$

    *this index may be quite similar to the first one because there is a few number

You could also try fitting these in first differences, e.g. change in entry / exit / churn between t and t-1.

2.5 Entry Rate Research

The entry rate can reflect the level of tech cluster attraction, the measurement method is given below:

$Entry\ Rate_{i,t} = \frac{incorporating\ firms}{all\ companies}$

Where $i$ means the tech cluster, $t$ means the tech incorporate year 2009, 2014,..., 2020

Note: only a few companies(1%, according to previous EDA research) have dissolution dates, the number of all companies includes dissolution ones.

  1. The 10 tech clusters will be picked to research the entry rate.
  2. If the results are not ideal, more clusters will be added to observe.

2.5.1 Labeling Tech Clusters

Refer to previous EDA:https://zeqiang.fun/CASA0012-Dissertation/jupyterlab/EDA.html

The top 10 cluster based on Travel to Work Area(TTWA) will be concerned as shown below

The_Top_10_Number_of_Tech_Firms_ttwa.png

FAQ:

1. Why I choose TTWA?

    This geographical division can better reflect the relationship between population, company and work.

2. What is TTWA?

    The 228 areas forming the 2011 TTWAs, covering the whole of the UK, were defined in 2015 using 2011 Census commuting flow data, indicating home and workplace address. The TTWAs are based on aggregations of Lower Layer Super Output Areas (LSOA) in England and Wales, Data Zones (DZ) in Scotland, and Super Output Areas (SOA) in Northern Ireland and in some cases span country borders. There are six cross-border TTWAs, 149 in England, 18 in Wales, 45 in Scotland and 10 in Northern Ireland.

If ttwa is attributed to 10_tech_clusters, then new column cluster_label will be set to the corresponding one.

In [16]:
# generate the top 10 tech clusters table based on the ttwa
top_10_ttwa = df.ttwa.value_counts().head(10)
top_10_ttwa = pd.DataFrame(top_10_ttwa).reset_index()
top_10_ttwa.columns = ["ttwa_code","counts"]

df_ttwa = pd.read_csv(PATH + "/Dataset/ttwa.csv")
df_ttwa = pd.merge(top_10_ttwa, df_ttwa[["code","name"]], left_on="ttwa_code",right_on="code")
df_ttwa_10 = df_ttwa[["ttwa_code", "name","counts"]]
df_ttwa_10.style
Out[16]:
ttwa_code name counts
0 E30000234 London 146959
1 E30000239 Manchester 21683
2 E30000266 Slough and Heathrow 20224
3 E30000169 Birmingham 11262
4 E30000237 Luton 10408
5 E30000175 Bournemouth 9232
6 E30000179 Brighton 8368
7 E30000186 Cambridge 7296
8 E30000180 Bristol 7090
9 E30000284 Warrington and Wigan 6968

2.5.2 Data Filting

  1. Data Filting: remain 10 tech clusters data

    Then the ttwa can be considered as the cluster label

In [17]:
# df_10_tc means df_10_tech_clusters
df_10_tc = df.copy()
print(df_10_tc.shape)
df_10_tc = df_10_tc[df_10_tc.ttwa.isin(df_ttwa_10.ttwa_code.to_list())]
print(df_10_tc.shape)
(513708, 64)
(249490, 64)
In [18]:
df_10_tc.birth_year.value_counts()
Out[18]:
2018.0    36350
2017.0    22303
2016.0    20551
2015.0    20468
2014.0    17832
          ...  
1905.0        1
1906.0        1
1918.0        1
1908.0        1
1877.0        1
Name: birth_year, Length: 129, dtype: int64
  1. Data Filting: pick birth year from 2013 to 2018

    refer to tech cluster

London, UK, from 2009 to 2018. The London technology cluster has occupied the world's leading level in the world venture capital market and start-up unicorn companies. The technology cluster effect has played an important role in this period of time (Kerr & et al., 2020).

Kerr, William R., and Frederic Robert-Nicoud. 2020. "Tech Clusters." Journal of Economic Perspectives, 34 (3): 50-76. https://www.aeaweb.org/articles?id=10.1257/jep.34.3.50

In [82]:
print(df_10_tc.shape)
# df_10_tc = df_10_tc[df_10_tc.birth_year>2008]
print(df_10_tc.shape)
(249490, 64)
(164986, 64)

2.5.3 Pivot Operation (Excel)

Export csv to excel to do pivot operation:

  • rows: ttwa (ttwa there is considered as the tech cluster label)
  • columns: birth_year
  • value: count of register_number

Then, calculate the entry number in excel

In [87]:
df_10_tc_clipped = df_10_tc[["registered_number","ttwa","birth_year"]]
df_10_tc_clipped.to_csv(PATH + "/fame_OC_10_tc_clipped.csv")

Import the processed dataset from .xlsx file

In [11]:
# read the entry rate dataset from excel file 
sheetName = "entry_rate"
df_entry_excel = pd.read_excel(PATH + "/Excel/fame_OC_10_tc_clipped.xlsx",sheet_name = sheetName)
df_entry_excel.head()
Out[11]:
ttwa Grand Total 1900 1901 1902 1904 1905 1906 1907 1908 ... 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 E30000169 9592 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.027314 0.039304 0.040450 0.057339 0.074750 0.090179 0.105088 0.094245 0.098415 0.150751
1 E30000175 7546 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.017493 0.032202 0.041479 0.049960 0.049033 0.076067 0.122184 0.137291 0.161675 0.211105
2 E30000179 7482 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.022053 0.041834 0.056001 0.063085 0.093692 0.107725 0.130179 0.120556 0.102646 0.117215
3 E30000180 6176 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.027850 0.038860 0.051975 0.065576 0.069462 0.088407 0.093588 0.092455 0.089216 0.121762
4 E30000186 6251 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000160 0.000000 ... 0.026716 0.033275 0.041433 0.049912 0.059510 0.065270 0.080947 0.087186 0.099824 0.175492
5 E30000234 118998 0.000017 0.000008 0.000000 0.000008 0.000008 0.000008 0.000034 0.000000 ... 0.025017 0.035404 0.044169 0.055631 0.067833 0.087287 0.098346 0.100153 0.109523 0.187843
6 E30000237 8869 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.026159 0.040929 0.052092 0.058293 0.067651 0.087383 0.099899 0.087834 0.112189 0.162927
7 E30000239 18270 0.000000 0.000000 0.000055 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.029338 0.038259 0.044663 0.055993 0.065189 0.086864 0.099453 0.097209 0.100219 0.158894
8 E30000266 17364 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000058 ... 0.030811 0.039795 0.049988 0.062946 0.071124 0.093296 0.101014 0.093642 0.095255 0.138793
9 E30000284 4950 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.019798 0.027071 0.032929 0.035556 0.046061 0.053333 0.064646 0.099798 0.137980 0.297778
10 Grand Total 205498 0.000010 0.000005 0.000005 0.000005 0.000005 0.000005 0.000024 0.000005 ... 0.025674 0.036404 0.045086 0.056181 0.067714 0.086775 0.099602 0.100006 0.108531 0.176887

11 rows × 120 columns

In [12]:
# Preprocess the entry rate dataset
df_entry_rate = df_entry_excel.rename({"ttwa":"tech_cluster","Grand Total":"all_companies"},axis=1)
df_entry_rate.head()
Out[12]:
tech_cluster all_companies 1900 1901 1902 1904 1905 1906 1907 1908 ... 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 E30000169 9592 0.0 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 ... 0.027314 0.039304 0.040450 0.057339 0.074750 0.090179 0.105088 0.094245 0.098415 0.150751
1 E30000175 7546 0.0 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 ... 0.017493 0.032202 0.041479 0.049960 0.049033 0.076067 0.122184 0.137291 0.161675 0.211105
2 E30000179 7482 0.0 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 ... 0.022053 0.041834 0.056001 0.063085 0.093692 0.107725 0.130179 0.120556 0.102646 0.117215
3 E30000180 6176 0.0 0.0 0.0 0.0 0.0 0.0 0.00000 0.0 ... 0.027850 0.038860 0.051975 0.065576 0.069462 0.088407 0.093588 0.092455 0.089216 0.121762
4 E30000186 6251 0.0 0.0 0.0 0.0 0.0 0.0 0.00016 0.0 ... 0.026716 0.033275 0.041433 0.049912 0.059510 0.065270 0.080947 0.087186 0.099824 0.175492

5 rows × 120 columns

2.5.4 Regression

add ttwa name to the entry rate dataset

In [53]:
entry_rate = pd.merge(df_entry_rate, df_ttwa_10,left_on = "tech_cluster", right_on = "ttwa_code")
entry_rate.drop({"counts"},axis=1,inplace = True)
entry_rate
Out[53]:
tech_cluster all_companies 1900 1901 1902 1904 1905 1906 1907 1908 ... 2011 2012 2013 2014 2015 2016 2017 2018 ttwa_code name
0 E30000169 9592 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0405 0.0573 0.0747 0.0902 0.1051 0.0942 0.0984 0.1508 E30000169 Birmingham
1 E30000175 7546 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0415 0.0500 0.0490 0.0761 0.1222 0.1373 0.1617 0.2111 E30000175 Bournemouth
2 E30000179 7482 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0560 0.0631 0.0937 0.1077 0.1302 0.1206 0.1026 0.1172 E30000179 Brighton
3 E30000180 6176 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0520 0.0656 0.0695 0.0884 0.0936 0.0925 0.0892 0.1218 E30000180 Bristol
4 E30000186 6251 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0000 ... 0.0414 0.0499 0.0595 0.0653 0.0809 0.0872 0.0998 0.1755 E30000186 Cambridge
5 E30000234 118998 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0442 0.0556 0.0678 0.0873 0.0983 0.1002 0.1095 0.1878 E30000234 London
6 E30000237 8869 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0521 0.0583 0.0677 0.0874 0.0999 0.0878 0.1122 0.1629 E30000237 Luton
7 E30000239 18270 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0447 0.0560 0.0652 0.0869 0.0995 0.0972 0.1002 0.1589 E30000239 Manchester
8 E30000266 17364 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 ... 0.0500 0.0629 0.0711 0.0933 0.1010 0.0936 0.0953 0.1388 E30000266 Slough and Heathrow
9 E30000284 4950 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 ... 0.0329 0.0356 0.0461 0.0533 0.0646 0.0998 0.1380 0.2978 E30000284 Warrington and Wigan

10 rows × 122 columns

take London as the first example

In [61]:
# calculate the entry rate of London
London_ER = entry_rate[entry_rate["tech_cluster"]=="E30000234"].reset_index()
London_ER.drop({"index","all_companies","tech_cluster","ttwa_code","name"},axis=1,inplace = True)

X_t = London_ER.columns.to_list() # x axis is t(year) for London
Y_entry_rate = London_ER.values.tolist()[0] # y axis is the entry rate  for London
In [68]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm

pd.set_option('display.max_rows', 300) # specifies number of rows to show
pd.options.display.float_format = '{:40,.4f}'.format # specifies default number format to 4 decimal places
plt.style.use('ggplot') # specifies that graphs should use ggplot styling
%matplotlib inline

# fit a lineare regression model using statsmodel
regressor_OLS = sm.OLS(endog=Y_entry_rate, exog=sm.add_constant(X_t)).fit()
regressor_OLS.summary()
Out[68]:
OLS Regression Results
Dep. Variable: y R-squared: 0.253
Model: OLS Adj. R-squared: 0.246
Method: Least Squares F-statistic: 39.27
Date: Tue, 13 Jul 2021 Prob (F-statistic): 6.49e-09
Time: 21:02:37 Log-Likelihood: 280.32
No. Observations: 118 AIC: -556.6
Df Residuals: 116 BIC: -551.1
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -0.7435 0.120 -6.195 0.000 -0.981 -0.506
x1 0.0004 6.12e-05 6.267 0.000 0.000 0.001
Omnibus: 129.234 Durbin-Watson: 0.120
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2345.954
Skew: 3.843 Prob(JB): 0.00
Kurtosis: 23.447 Cond. No. 1.13e+05


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.13e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

2.6 Exit Rate

to do

2.7 Churn Rate

to do

2.8 LQ research

to do

In [ ]: