Regression Research for Dynamics Variables (Top 10 Tech Clusters)¶

Current Question¶

I used multiple linear regression for the this research. Did the following OLS estimation you wrote in the last communication use the time series forecasting model such as ARIMA OLS estimation? Do I need to use time series forecasting to estimate?

Dynamics_variable_it = a1 + b2#firms_it-n + b3industry mix_it-n + Xb_it-n + location_i + year_t + e_it (1)

Firm_performance_it = c1 + c2#firms_it-n + c3dynamics_it-n + c4industry mix_it-n + Xc_it-n + location_i + year_t + u_it (2)

P.S. I found that Firm_performance_it is also only available for Average_of_latest_accounts_assets in the dataset. OR are there other indicators that can be used to measure Firm_performance_it
HIH index might not be available because a company’s market share is difficult to measure for the existing dataset

1 Key Results¶

My Research Question:

How does tech clusters' dynamics pattern change in UK from 1998 to 2018? / What factors can affect tech clusters' dynamics pattern change in UK?
To what extent will dynamic changes affect tech clusters’ performance？

1.2 Data¶

Below is my dataset for regression research, you can access/download it in this link

df3.head()

1.3 Hypothesis¶

With reference to the OLS estimation that you said last time, I made two assumptions in section 2.3 Hypothesis. I selected the second assumption as following key results:

$\ log_e{\ Dynamics\ Variable_{i,t}}\ =\ a_1 + b_1 \times firms_{i,t} + b_2 \times location_i + b_3 \times year_t$ (1)

$\ log_e\ Firm\ Performance_{i,t}\ =\ c_1 + d_1 \times firms_{i,t} + d_2 \times dynamics_{i,t} + d_3 \times location_i + d_4 \times year_t$ (2)

$Dynamics\ Variable_{i,t}$ in (1) means Entry_Rate in $t$ and $i$
$dynamics_{i,t}$ in (2) means Entry_Rate in a specific $t$ and $i$
$firms_{i,t}$ means Count_of_tech_firms in a specific $t$ and $i$
$t$ means year -- birth_year
$i$ means location -- ttwa (travel to work area)

1.4 Regression¶

For the multiple linear regression of the above assumptions, the final results are given below; the goodness of model fit is not bad ($R^2$ is high)

$\ log_e{\ Dynamics\ Variable_{i,t}} = 2.8e^{-5} \times firms_{i,t} + b_2 \times location_i + 0.15 \times year_t - 310$（1）

$\ log_e\ Firm\ Performance_{i,t}\ = 0.0001 \times firms_{i,t} + {-8.9585} \times dynamics_{i,t} + d_3 \times location_i + {-0.0675} \times year_t + 147.5$ (2)

NOTE: $b_2$ and $d_3$ should be seen in the following regression summary because $location_i$ is a dummy independent variable.

The regression report details are given below (Report No.1 & No.2), and further residual analysis can be viewed in section 2.4.2

【Report No.1】

This is the Dynamics Variable (1) regression summary:

NOTE: for dummy variable $location_i$, ttwa_E30000169（Birmingham） is dropped as a reference objectivity in the regression

# for formula (1)
hypo2_1_regressor_OLS.summary()

【Report No.2】

This is the Firms' Performance (2) regression summary:

# for formula (2)

hypo2_2_regressor_OLS.summary()

For better see the ttwa cluster name:

df_ttwa_10.style

1.5 Insights¶

$\ log_e{\ Dynamics\ Variable_{i,t}} = 2.8e^{-5} \times firms_{i,t} + b_2 \times location_i + 0.15 \times year_t - 310$（1）

$\ log_e\ Firm\ Performance_{i,t}\ = 0.0001 \times firms_{i,t} + {-8.9585} \times dynamics_{i,t} + d_3 \times location_i + {-0.0675} \times year_t + 147.5$ (2)

The more number of firms in the cluster increase and year grow, the more firms will entry this cluster
E30000212 (Guildford and Aldershot) & E30000186(Cambridge) area 相比 E30000169(Birmingham) 更能吸引公司进入其对应的科技集群
The second result is not so ideal becasue firm performance is based on Average_of_latest_accounts_assets variable which has 41% missing value in dataset

2 Reproducible Analysis¶

2.1 Read Data¶

import pandas as pd
import sys
PATH = sys.path[0]

df = pd.read_csv(PATH + "/Dataset/fame_OC_tech_firm.csv",low_memory=False)
df = df.drop({"Unnamed: 0","Unnamed: 0.1","Unnamed: 0.1.1","is_tech"},axis = 1)

# preprocess the data
# drop the null value record in birth year and ttwa(travel to work area) columns
df = df.dropna(subset = {"birth_year","ttwa"})

2.2 Data Preparation¶

Get the top 10 ttwas which have the most number of firms

# Generate the top 10 tech clusters table based on the ttwa
top_10_ttwa = df.ttwa.value_counts().head(10)
top_10_ttwa = pd.DataFrame(top_10_ttwa).reset_index()
top_10_ttwa.columns = ["ttwa_code","counts"]

df_ttwa = pd.read_csv(PATH + "/Dataset/ttwa.csv")
df_ttwa = pd.merge(top_10_ttwa, df_ttwa[["code","name"]], left_on="ttwa_code",right_on="code")
df_ttwa_10 = df_ttwa[["ttwa_code", "name","counts"]]
df_ttwa_10.style

Filter data by ttwa

# df_10_tc means "top 10 tech ttwa clusters"
df_10_tc = df.copy()
print(df_10_tc.shape)

# filter data to top 10 ttwa data raws
df_10_tc = df_10_tc[df_10_tc.ttwa.isin(df_ttwa_10.ttwa_code.to_list())]
print(df_10_tc.shape)

(423634, 64)
(206557, 64)

Filter data by birth_year

print(df_10_tc.shape)
# df_10_tc_20 means "top 10 tech ttwa clusters in 20 years dataframe"
df_10_tc_20 = df_10_tc[df_10_tc["birth_year"]>=1998]
# data beautify
df_10_tc_20 = df_10_tc_20.reset_index().drop("index",axis=1)
print(df_10_tc_20.shape)
df_10_tc_20.head()

(206557, 64)
(196071, 64)

Select the necessary columns:

location: ttwa
year: birth_year & diss_year
firms' performance : latest_accounts_assets

df2 = df_10_tc_20.copy()

# select data in below columns
df2 = df2[["registered_number",\
        "ttwa",\
        "birth_year",\
        "diss_year",\
        "latest_accounts_assets"]]

df2.head()

Export above df2 into Excel to conduct a pivot operation

# export to excel to process
df2.to_excel(PATH + "/Top_10_Tech_TTWA_Cluster_Reg_Prepare.xlsx", index = False)

After doing pivot in Excel, read this .xlsx file to get the processed dataset df3

df3 = pd.read_excel(PATH + "/Excel/Top_10_Tech_TTWA_Cluster_Reg_Prepare.xlsx",sheet_name = "dynamics")
df3.head()

Distribution analysis for the dependent variable Entry_Rate

import seaborn as sns
import numpy as np

df_dist = df3.copy()
df_dist.Entry_Rate = np.log(df_dist["Entry_Rate"])
sns.displot(df_dist, x="Entry_Rate",bins=80)

<seaborn.axisgrid.FacetGrid at 0x7ffac9c90fd0>

df_dist = df3.copy()
df_dist.Average_of_latest_accounts_assets = np.log1p(df_dist["Average_of_latest_accounts_assets"])
sns.displot(df_dist, x="Average_of_latest_accounts_assets",bins=80)

<seaborn.axisgrid.FacetGrid at 0x7ffb01926898>

2.3 Hypothesis¶

Hypothesis 1:

$\ Dynamics\ Variable_{i,t}\ =\ a_1 + b_1 \times firms_{i,t} + b_2 \times location_i + b_3 \times year_t$ (1)

$\ Firm\ Performance_{i,t}\ =\ c_1 + d_1 \times firms_{i,t} + d_2 \times dynamics_{i,t} + d_3 \times location_i + d_4 \times year_t$ (2)

Hypothesis 2:

$\ log_e{\ Dynamics\ Variable_{i,t}}\ =\ a_1 + b_1 \times firms_{i,t} + b_2 \times location_i + b_3 \times year_t$ (1)

$\ log_e\ Firm\ Performance_{i,t}\ =\ c_1 + d_1 \times firms_{i,t} + d_2 \times dynamics_{i,t} + d_3 \times location_i + d_4 \times year_t$ (2)

$Dynamics\ Variable_{i,t}$ in (1) means Entry_Rate in $t$ and $i$
$dynamics_{i,t}$ in (2) means Entry_Rate in a specific $t$ and $i$
$firms_{i,t}$ means Count_of_tech_firms in a specific $t$ and $i$
$t$ means year -- birth_year
$i$ means location -- ttwa (travel to work area)

2.4 Regression¶

# environment preparation
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_rows', 300) # specifies number of rows to show
pd.options.display.float_format = '{:40,.4f}'.format # specifies default number format to 4 decimal places
plt.style.use('ggplot') # specifies that graphs should use ggplot styling
%matplotlib inline

2.4.1 Hypothesis 1¶

df3_2 = df3.copy()

# intro dummy variables
df3_2.birth_year = df3_2.birth_year.astype("int")
df3_2_numeric = pd.get_dummies(df3_2)

# add dummies operation requirement 
# Set the reference object is ttwa_E30000169（Birmingham）
# drop Birmingham for dummies variables comparasion
df3_2_numeric = df3_2_numeric.drop(['ttwa_E30000169'], axis=1)

# preprocess drop n/a value
df3_2_final = df3_2_numeric.dropna(subset = ["Average_of_latest_accounts_assets"])

# correlation analysis
df = df3_2_final
plt.rcParams["axes.grid"] = False
f = plt.figure(figsize=(19, 15))
plt.matshow(df.corr(), fignum=f.number)
plt.xticks(range(df.shape[1]), df.columns, fontsize=14, rotation=45)
plt.yticks(range(df.shape[1]), df.columns, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Correlation Matrix', fontsize=16)

Text(0.5, 1.0, 'Correlation Matrix')

High correlation variables groups:

Total_tech_firms & birth_year
Count_of_tech_firms & birth_year
Count_of_tech_firms & E30000234 (London)

Then I introduce VIF method to help remove columns which might cause strong multicollinearity problem

# VIF method
from statsmodels.stats.outliers_influence import variance_inflation_factor 
from statsmodels.tools.tools import add_constant

def drop_column_using_vif_(df, list_var_not_to_remove, thresh=5):
    '''
    Calculates VIF each feature in a pandas dataframe, and repeatedly drop the columns with the highest VIF
    A constant must be added to variance_inflation_factor or the results will be incorrect

    :param df: the pandas dataframe containing only the predictor features, not the response variable
    :param list_var_not_to_remove: the list of variables that should not be removed even though it has a high VIF. For example, dummy (or indicator) variables represent a categorical variable with three or more categories.
    :param thresh: the max VIF value before the feature is removed from the dataframe
    :return: dataframe with multicollinear features removed
    '''
    while True:
        # adding a constatnt item to the data
        df_with_const = add_constant(df)

        vif_df = pd.Series([variance_inflation_factor(df_with_const.values, i) 
               for i in range(df_with_const.shape[1])], name= "VIF",
              index=df_with_const.columns).to_frame()

        # drop the const
        vif_df = vif_df.drop('const').drop(list_var_not_to_remove)
        
        print('Max VIF:', vif_df.VIF.max())
        
        # if the largest VIF is above the thresh, remove a variable with the largest VIF
        if vif_df.VIF.max() > thresh:
            # If there are multiple variables with the maximum VIF, choose the first one
            index_to_drop = vif_df.index[vif_df.VIF == vif_df.VIF.max()].tolist()[0]
            print('Dropping: {}'.format(index_to_drop))
            df = df.drop(columns = index_to_drop)
        else:
            # No VIF is above threshold. Exit the loop
            break

    return df

predictors_df3 = df3_2_final.drop('Entry_Rate', axis=1)
response_df3 = df3_2_final['Entry_Rate']

# this is a list of dummy variables that represent a categorical variable with three or more categories. They should not be removed even if it has a high VIF. 
# list_var_not_to_remove = ['ttwa_E30000234', 'Count_of_tech_firms', 'Total_tech_firms']
list_var_not_to_remove = ["birth_year","ttwa_E30000234"]
df_predictors_select_VIF = drop_column_using_vif_(predictors_df3, list_var_not_to_remove, thresh=5)
print("The columns remaining after VIF selection are:")
print(df_predictors_select_VIF.columns)

hypo1_1_regressor_OLS = sm.OLS(endog=response_df3, exog=sm.add_constant(df_predictors_select_VIF)).fit()
hypo1_1_regressor_OLS.summary()

Max VIF: inf
Dropping: Total_tech_firms
Max VIF: 2.5658319318283667
The columns remaining after VIF selection are:
Index(['birth_year', 'Count_of_tech_firms',
       'Average_of_latest_accounts_assets', 'ttwa_E30000175', 'ttwa_E30000179',
       'ttwa_E30000180', 'ttwa_E30000186', 'ttwa_E30000212', 'ttwa_E30000234',
       'ttwa_E30000237', 'ttwa_E30000239', 'ttwa_E30000266'],
      dtype='object')

Then, the residual analysis is conducted to test the regression performance

plt.scatter(hypo1_1_regressor_OLS.fittedvalues, hypo1_1_regressor_OLS.resid)
# adding title and labels
plt.xlabel('Fitted cases rate 1')
plt.ylabel('Residual')
plt.title('Figure')
plt.ylim(-0.5,0.5)
plt.show()

As given above, the performance for regression is quite good

predictors_df3 = df3_2_final.drop('Average_of_latest_accounts_assets', axis=1)
response_df3 = df3_2_final['Average_of_latest_accounts_assets']

# this is a list of dummy variables that represent a categorical variable with three or more categories. They should not be removed even if it has a high VIF. 
# list_var_not_to_remove = ['ttwa_E30000234', 'Count_of_tech_firms', 'Total_tech_firms']
list_var_not_to_remove = ["birth_year","ttwa_E30000234","Entry_Rate"]
df_predictors_select_VIF = drop_column_using_vif_(predictors_df3, list_var_not_to_remove, thresh=5)
print("The columns remaining after VIF selection are:")
print(df_predictors_select_VIF.columns)

hypo1_2_regressor_OLS = sm.OLS(endog=response_df3, exog=sm.add_constant(df_predictors_select_VIF)).fit()
hypo1_2_regressor_OLS.summary()

Max VIF: inf
Dropping: Total_tech_firms
Max VIF: 2.607020276192744
The columns remaining after VIF selection are:
Index(['birth_year', 'Count_of_tech_firms', 'Entry_Rate', 'ttwa_E30000175',
       'ttwa_E30000179', 'ttwa_E30000180', 'ttwa_E30000186', 'ttwa_E30000212',
       'ttwa_E30000234', 'ttwa_E30000237', 'ttwa_E30000239', 'ttwa_E30000266'],
      dtype='object')

plt.scatter(hypo1_2_regressor_OLS.fittedvalues, hypo1_2_regressor_OLS.resid)
# adding title and labels
plt.xlabel('Fitted cases rate 1')
plt.ylabel('Residual')
plt.title('Figure')
# plt.ylim(-0.5,0.5)
plt.show()

2.4.2 Hypothesis 2¶

Refer to: Dynamics_variable_it = a1 + b2#firms_it-n + b3industry mix_it-n + Xb_it-n + location_i + year_t + e_it (1)

$\ log_e{\ Dynamics\ Variable_{i,t}}\ =\ a_1 + b_1 \times firms_{i,t} + b_2 \times location_i + b_3 \times year_t$ (1)

df3_3_final = df3_2_final.copy()

predictors_df3 = df3_3_final.drop({'Entry_Rate',"Average_of_latest_accounts_assets"}, axis=1)
response_df3 = np.log(df3_3_final['Entry_Rate'])

# this is a list of dummy variables that represent a categorical variable with three or more categories. They should not be removed even if it has a high VIF. 
# list_var_not_to_remove = ['ttwa_E30000234', 'Count_of_tech_firms', 'Total_tech_firms']
list_var_not_to_remove = ["ttwa_E30000234","birth_year"]
df_predictors_select_VIF = drop_column_using_vif_(predictors_df3, list_var_not_to_remove, thresh=2.5)
print("The columns remaining after VIF selection are:")
print(df_predictors_select_VIF.columns)

hypo2_1_regressor_OLS = sm.OLS(endog=response_df3, exog=sm.add_constant(df_predictors_select_VIF)).fit()
hypo2_1_regressor_OLS.summary()

Max VIF: inf
Dropping: Total_tech_firms
Max VIF: 2.13351053449709
The columns remaining after VIF selection are:
Index(['birth_year', 'Count_of_tech_firms', 'ttwa_E30000175', 'ttwa_E30000179',
       'ttwa_E30000180', 'ttwa_E30000186', 'ttwa_E30000212', 'ttwa_E30000234',
       'ttwa_E30000237', 'ttwa_E30000239', 'ttwa_E30000266'],
      dtype='object')

The result can be writen in the below formula:

$ log_e{\ Dynamics\ Variable_{i,t}} = -310 + 2.8e^{-5} \times firms_{i,t} + b_2 \times location_i + 0.15 \times year_t$

plt.scatter(hypo2_1_regressor_OLS.fittedvalues, hypo2_1_regressor_OLS.resid)
# adding title and labels
plt.xlabel('Fitted Dynamics Variables after Logarithmic Transformation')
plt.ylabel('Residual')
plt.title('Residual vs. Fitted Plot of Dynamics Variables')
plt.ylim(-2,2)
plt.show()

fig1 = plt.figure(figsize=(12,8))
fig1 = sm.graphics.plot_regress_exog(hypo2_1_regressor_OLS, 'birth_year', fig=fig1)

The second formula:

Refer to: Firm_performance_it = c1 + c2#firms_it-n + c3dynamics_it-n + c4industry mix_it-n + Xc_it-n + location_i + year_t + u_it (2)

$\ log_e\ Firm\ Performance_{i,t}\ =\ c_1 + d_1 \times firms_{i,t} + d_2 \times dynamics_{i,t} + d_3 \times location_i + d_4 \times year_t$ (2)

df3_3_final = df3_2_final.copy()

predictors_df3 = df3_3_final.drop({"Average_of_latest_accounts_assets"}, axis=1)
response_df3 = np.log(df3_3_final['Average_of_latest_accounts_assets'])

# this is a list of dummy variables that represent a categorical variable with three or more categories. They should not be removed even if it has a high VIF. 
# list_var_not_to_remove = ['ttwa_E30000234', 'Count_of_tech_firms', 'Total_tech_firms']
list_var_not_to_remove = ["ttwa_E30000234","birth_year","Entry_Rate","Count_of_tech_firms"]
df_predictors_select_VIF = drop_column_using_vif_(predictors_df3, list_var_not_to_remove, thresh=2.5)
print("The columns remaining after VIF selection are:")
print(df_predictors_select_VIF.columns)

hypo2_2_regressor_OLS = sm.OLS(endog=response_df3, exog=sm.add_constant(df_predictors_select_VIF)).fit()
hypo2_2_regressor_OLS.summary()

Max VIF: inf
Dropping: Total_tech_firms
Max VIF: 1.8437386819351917
The columns remaining after VIF selection are:
Index(['birth_year', 'Count_of_tech_firms', 'Entry_Rate', 'ttwa_E30000175',
       'ttwa_E30000179', 'ttwa_E30000180', 'ttwa_E30000186', 'ttwa_E30000212',
       'ttwa_E30000234', 'ttwa_E30000237', 'ttwa_E30000239', 'ttwa_E30000266'],
      dtype='object')

$\ Firm\ Performance_{i,t}\ =\ 147.5 + 0.0001 \times firms_{i,t} + {-8.9585} \times dynamics_{i,t} + d_3 \times location_i + {-0.0675} \times year_t$ (2)

plt.scatter(hypo2_2_regressor_OLS.fittedvalues, hypo2_2_regressor_OLS.resid)
# adding title and labels
plt.xlabel('Fitted Dynamics Variables after Logarithmic Transformation')
plt.ylabel('Residual')
plt.title('Residual vs. Fitted Plot of Dynamics Variables')
plt.ylim(-3,3)
plt.show()

fig2 = plt.figure(figsize=(12,8))
fig2 = sm.graphics.plot_regress_exog(hypo2_2_regressor_OLS, 'Entry_Rate', fig=fig2)

	ttwa	birth_year	Count_of_tech_firms	Average_of_latest_accounts_assets	Total_tech_firms	Entry_Rate
0	E30000169	1998	57	154,110.7632	9047	0.0063
1	E30000169	1999	78	392,610.4737	9047	0.0086
2	E30000169	2000	82	234,658.3898	9047	0.0091
3	E30000169	2001	67	198,533.5192	9047	0.0074
4	E30000169	2002	125	169,835.2778	9047	0.0138

Dep. Variable:	Entry_Rate	R-squared:	0.932
Model:	OLS	Adj. R-squared:	0.928
Method:	Least Squares	F-statistic:	238.5
Date:	Tue, 27 Jul 2021	Prob (F-statistic):	3.43e-105
Time:	00:49:45	Log-Likelihood:	-9.0648
No. Observations:	203	AIC:	42.13
Df Residuals:	191	BIC:	81.89
Df Model:	11
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-310.4470	6.992	-44.399	0.000	-324.239	-296.655
birth_year	0.1529	0.003	43.878	0.000	0.146	0.160
Count_of_tech_firms	2.804e-05	1.13e-05	2.489	0.014	5.82e-06	5.03e-05
ttwa_E30000175	-0.4938	0.082	-5.986	0.000	-0.657	-0.331
ttwa_E30000179	-0.1491	0.082	-1.829	0.069	-0.310	0.012
ttwa_E30000180	0.1377	0.082	1.669	0.097	-0.025	0.300
ttwa_E30000186	0.1193	0.083	1.446	0.150	-0.043	0.282
ttwa_E30000212	0.2416	0.083	2.928	0.004	0.079	0.404
ttwa_E30000234	-0.2003	0.099	-2.028	0.044	-0.395	-0.005
ttwa_E30000237	-0.0440	0.082	-0.533	0.595	-0.207	0.119
ttwa_E30000239	0.0074	0.082	0.091	0.928	-0.154	0.168
ttwa_E30000266	0.0248	0.083	0.300	0.765	-0.138	0.188

Omnibus:	6.590	Durbin-Watson:	1.008
Prob(Omnibus):	0.037	Jarque-Bera (JB):	6.880
Skew:	0.316	Prob(JB):	0.0321
Kurtosis:	3.643	Cond. No.	1.04e+06

Dep. Variable:	Average_of_latest_accounts_assets	R-squared:	0.745
Model:	OLS	Adj. R-squared:	0.729
Method:	Least Squares	F-statistic:	46.37
Date:	Tue, 27 Jul 2021	Prob (F-statistic):	6.23e-50
Time:	00:50:00	Log-Likelihood:	-99.816
No. Observations:	203	AIC:	225.6
Df Residuals:	190	BIC:	268.7
Df Model:	12
Covariance Type:	nonrobust

Omnibus:	103.810	Durbin-Watson:	1.682
Prob(Omnibus):	0.000	Jarque-Bera (JB):	1355.690
Skew:	-1.582	Prob(JB):	4.13e-295
Kurtosis:	15.259	Cond. No.	2.11e+06

	ttwa_code	name	counts
0	E30000234	London	119008
1	E30000239	Manchester	18273
2	E30000266	Slough and Heathrow	17367
3	E30000169	Birmingham	9593
4	E30000237	Luton	8873
5	E30000175	Bournemouth	7546
6	E30000179	Brighton	7483
7	E30000186	Cambridge	6252
8	E30000180	Bristol	6176
9	E30000212	Guildford and Aldershot	5986

	registered_number	incorporation_date	dissolution_date	latest_accounts_cash	latest_accounts_assets	latest_accounts_liabilities	year_obs	sic_year	sic4	...	pcd2	pcds	cty	laua	ctry	rgn	ttwa	pct	lep1	lep2
0	07341142	2010-08-10	NaN	NaN	56331.0	NaN	2017	2007.0	6202.0	...	AL1 1AJ	AL1 1AJ	E10000015	E07000240	E92000001	E12000006	E30000237	E16000150	E37000017	NaN
1	10490667	2016-11-22	NaN	NaN	11254.0	NaN	2018	2007.0	7211.0	...	AL1 1AU	AL1 1AU	E10000015	E07000240	E92000001	E12000006	E30000237	E16000150	E37000017	NaN
2	11283731	2018-03-29	NaN	NaN	NaN	NaN	2018	2007.0	6201.0	...	AL1 1DG	AL1 1DG	E10000015	E07000240	E92000001	E12000006	E30000237	E16000150	E37000017	NaN
3	09852054	2015-11-02	NaN	NaN	20002.0	NaN	2017	2007.0	6201.0	...	AL1 1DG	AL1 1DG	E10000015	E07000240	E92000001	E12000006	E30000237	E16000150	E37000017	NaN
4	08577485	2013-06-20	NaN	NaN	8390.0	11888.0	2017	2007.0	8621.0	...	AL1 1DS	AL1 1DS	E10000015	E07000240	E92000001	E12000006	E30000237	E16000150	E37000017	NaN

	registered_number	ttwa	birth_year	diss_year	latest_accounts_assets
0	07341142	E30000237	2,010.0000	nan	56,331.0000
1	10490667	E30000237	2,016.0000	nan	11,254.0000
2	11283731	E30000237	2,018.0000	nan	nan
3	09852054	E30000237	2,015.0000	nan	20,002.0000
4	08577485	E30000237	2,013.0000	nan	8,390.0000

	ttwa	birth_year	Count_of_tech_firms	Average_of_latest_accounts_assets	Total_tech_firms	Entry_Rate
0	E30000169	1998	57	154110.763158	9047	0.006300
1	E30000169	1999	78	392610.473684	9047	0.008622
2	E30000169	2000	82	234658.389831	9047	0.009064
3	E30000169	2001	67	198533.519231	9047	0.007406
4	E30000169	2002	125	169835.277778	9047	0.013817

Omnibus:	44.698	Durbin-Watson:	0.499
Prob(Omnibus):	0.000	Jarque-Bera (JB):	86.296
Skew:	1.074	Prob(JB):	1.82e-19
Kurtosis:	5.364	Cond. No.	1.04e+08

Omnibus:	55.982	Durbin-Watson:	1.846
Prob(Omnibus):	0.000	Jarque-Bera (JB):	175.434
Skew:	1.114	Prob(JB):	8.03e-39
Kurtosis:	6.972	Cond. No.	2.11e+06

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	147.5047	22.122	6.668	0.000	103.868	191.142
birth_year	-0.0675	0.011	-6.111	0.000	-0.089	-0.046
Count_of_tech_firms	0.0001	1.95e-05	5.773	0.000	7.42e-05	0.000
Entry_Rate	-8.9585	1.814	-4.940	0.000	-12.536	-5.381
ttwa_E30000175	-0.2359	0.129	-1.823	0.070	-0.491	0.019
ttwa_E30000179	-0.2070	0.128	-1.618	0.107	-0.459	0.045
ttwa_E30000180	0.0600	0.129	0.464	0.643	-0.195	0.315
ttwa_E30000186	0.1323	0.129	1.023	0.308	-0.123	0.387
ttwa_E30000212	0.2389	0.129	1.845	0.067	-0.016	0.494
ttwa_E30000234	0.0065	0.159	0.041	0.967	-0.307	0.320
ttwa_E30000237	0.0750	0.129	0.580	0.563	-0.180	0.330
ttwa_E30000239	-0.0467	0.128	-0.365	0.716	-0.299	0.206
ttwa_E30000266	0.0912	0.129	0.704	0.482	-0.164	0.347

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-11.5268	0.668	-17.259	0.000	-12.844	-10.209
birth_year	0.0058	0.000	17.358	0.000	0.005	0.006
Count_of_tech_firms	4.046e-06	7.6e-07	5.323	0.000	2.55e-06	5.55e-06
Average_of_latest_accounts_assets	3.257e-08	1.77e-08	1.838	0.068	-2.39e-09	6.75e-08
ttwa_E30000175	-0.0012	0.005	-0.240	0.811	-0.011	0.009
ttwa_E30000179	0.0041	0.005	0.807	0.421	-0.006	0.014
ttwa_E30000180	0.0022	0.005	0.422	0.673	-0.008	0.012
ttwa_E30000186	-0.0014	0.005	-0.277	0.782	-0.012	0.009
ttwa_E30000212	0.0022	0.005	0.428	0.669	-0.008	0.012
ttwa_E30000234	-0.0199	0.006	-3.234	0.001	-0.032	-0.008
ttwa_E30000237	-0.0008	0.005	-0.156	0.877	-0.011	0.009
ttwa_E30000239	0.0007	0.005	0.142	0.887	-0.009	0.011
ttwa_E30000266	-0.0008	0.005	-0.148	0.883	-0.011	0.009

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	3.429e+07	3.56e+06	9.635	0.000	2.73e+07	4.13e+07
birth_year	-1.702e+04	1777.930	-9.575	0.000	-2.05e+04	-1.35e+04
Count_of_tech_firms	14.1961	3.141	4.519	0.000	8.000	20.392
Entry_Rate	5.362e+05	2.92e+05	1.838	0.068	-3.93e+04	1.11e+06
ttwa_E30000175	-3.437e+04	2.08e+04	-1.651	0.100	-7.54e+04	6690.439
ttwa_E30000179	-3.258e+04	2.06e+04	-1.583	0.115	-7.32e+04	8018.824
ttwa_E30000180	-2816.1829	2.08e+04	-0.135	0.893	-4.39e+04	3.82e+04
ttwa_E30000186	1.849e+04	2.08e+04	0.888	0.375	-2.26e+04	5.95e+04
ttwa_E30000212	2.169e+04	2.08e+04	1.041	0.299	-1.94e+04	6.28e+04
ttwa_E30000234	4219.1182	2.56e+04	0.165	0.869	-4.63e+04	5.47e+04
ttwa_E30000237	1.075e+04	2.08e+04	0.517	0.606	-3.03e+04	5.18e+04
ttwa_E30000239	6566.5115	2.06e+04	0.319	0.750	-3.4e+04	4.72e+04
ttwa_E30000266	-1716.2042	2.08e+04	-0.082	0.934	-4.28e+04	3.94e+04

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	147.5047	22.122	6.668	0.000	103.868	191.142
birth_year	-0.0675	0.011	-6.111	0.000	-0.089	-0.046
Count_of_tech_firms	0.0001	1.95e-05	5.773	0.000	7.42e-05	0.000
Entry_Rate	-8.9585	1.814	-4.940	0.000	-12.536	-5.381
ttwa_E30000175	-0.2359	0.129	-1.823	0.070	-0.491	0.019
ttwa_E30000179	-0.2070	0.128	-1.618	0.107	-0.459	0.045
ttwa_E30000180	0.0600	0.129	0.464	0.643	-0.195	0.315
ttwa_E30000186	0.1323	0.129	1.023	0.308	-0.123	0.387
ttwa_E30000212	0.2389	0.129	1.845	0.067	-0.016	0.494
ttwa_E30000234	0.0065	0.159	0.041	0.967	-0.307	0.320
ttwa_E30000237	0.0750	0.129	0.580	0.563	-0.180	0.330
ttwa_E30000239	-0.0467	0.128	-0.365	0.716	-0.299	0.206
ttwa_E30000266	0.0912	0.129	0.704	0.482	-0.164	0.347