PanelDataAnalysis_01.knit
Panel Data Analysis – A sample project with R
Panel data analysis is a statistical method widely used in Social Sciences, Finance, and Economics. Panel data, also known as longitudinal data, are data where multiple cases (people, firms, countries, etc.) were observed at two or more time periods.
Before you proceed with the analysis, you should understand the nature of your data and specify your research question(s) or the hypotheses you wish to test.
Here are general steps for conducting a panel data analysis:
- Data Preparation and Preprocessing This step includes importing the data into your statistical software (like R, Python, Stata), checking for missing values, coding or transforming variables if necessary, and exploring basic summary statistics of your data.
- Visualizing the Data Plotting data over time for different airlines can provide initial insights about the data and might suggest what kind of panel data model (fixed effects, random effects, etc.) could be appropriate.
- Exploratory Data Analysis (EDA) Explore the data by checking the summary statistics of all variables, relationships between pairs of variables (scatterplots, correlations), distributions of variables (histograms, density plots), etc.
- Statistical Testing for Panel Data Specifics Depending on the nature of your data, you may need to conduct specific statistical tests before choosing the right panel data model. These tests might include:
- Hausman Test: Used to decide between a Fixed Effects Model vs a Random Effects Model.
- Breusch-Pagan LM test: For determining whether a random effects model or a simple OLS regression would be more appropriate.
- Model Building Depending on the previous steps, there are multiple ways to approach panel data, including:
- Pooled OLS regression: If you assume that there are no individual (airline-specific) or time effects.
- Fixed effects models: If you believe there are airline-specific effects that are constant over time.
- Random effects models: If you believe the airline-specific effects are random and not correlated with predictors.
- Dynamic panel data models: If you believe past values of your variables affect the current values.
- Model Evaluation and Assumptions Checking Check the assumptions of your model. This might include checking for autocorrelation, heteroscedasticity, multicollinearity, etc. You might also want to compare the performance of different models using criteria such as AIC, BIC, or cross-validation.
- Interpretation of Results Interpret your findings in the context of your research question(s). Remember that even statistically significant results need to be considered in terms of practical significance and in the context of existing research and theory.
- Reporting the Results Clearly communicate your methods, findings, and interpretations. This often involves a clear visualization of the results, accompanied by a write-up of the analysis.
Remember that these are general steps, and depending on your specific data and questions, some steps may be unnecessary or need to be modified. Each panel data set is unique and thus requires its own unique analysis approach. It is always recommended to understand your data and model before conducting any analysis.
Dataset Description The given dataset seems to be related to financial information about construction firms over multiple years (2018 to 2022). The fields in the dataset appear to represent the following:
id: This is likely a unique identifier for each firm. Each firm appears to have an associated unique ID number.
firm_name: This field represents the name of the construction firm.
industry: This field indicates the industry that the firm belongs to. In this case, all the firms listed are from the construction industry.
year: This field represents the year for which the particular row's data is relevant.
roa (Return on Assets): This is a profitability ratio that indicates the net income produced by total assets during a period by comparing net income to the average total assets.
roe (Return on Equity): This is the amount of net income returned as a percentage of shareholders equity, it measures a corporation's profitability by revealing how much profit a company generates with the money shareholders have invested.
fw, mw, in: These seem to be firm-specific metrics or variables, but without additional information, it's unclear what these abbreviations stand for.
firm_size: This could represent the size of the firm. The specific measurement isn't clear - it could be based on number of employees, revenue, or some other factor.
gdp: This likely represents the gross domestic product for the country where the firm operates for the specific year.
interest_rate: This likely represents the prevailing interest rate in the economy where the firm operates.
Some rows have ‘N/A’ values indicating missing data.
This dataset could be used to study the performance of firms within the construction industry across different years, and understand how their performance relates to various internal factors (like firm size) and external factors (like GDP and interest rates). Source: https://www.bursamalaysia.com/market_information/equities_prices?per_page=50&page=1
Import necessary libraries
library(dplyr)
library(tidyr)
library(plm) #library for pdata.frame function
library(ggplot2)
library(GGally)
library(lattice)
setwd(“…/Panel_data_analysis”)
df <- read.csv("Sample_modified.csv") # Load the data
head(df, n = 20) # View the first few rows of the data
## id firm_name industry year roa roe fw mw in.
## 1 1 ADVCON Construction 2018 0.02581262 0.05834585 31.21 25.15 9.16
## 2 1 ADVCON Construction 2019 0.02640413 0.05640305 31.28 23.59 10.77
## 3 1 ADVCON Construction 2020 0.00515132 0.01120195 31.04 19.69 6.80
## 4 1 ADVCON Construction 2021 0.00463757 0.00950923 26.10 16.54 13.45
## 5 1 ADVCON Construction 2022 NA NA NA NA NA
## 6 2 AGES Construction 2018 0.01574843 0.02792144 0.00 10.61 31.94
## 7 2 AGES Construction 2019 0.00950369 0.02736744 0.00 28.51 10.85
## 8 2 AGES Construction 2020 0.11872347 0.16986420 0.00 17.93 12.77
## 9 2 AGES Construction 2021 0.07822875 0.10173822 0.00 12.70 22.65
## 10 2 AGES Construction 2022 NA NA NA NA NA
## 11 3 AME Construction 2018 NA NA NA NA NA
## 12 3 AME Construction 2019 NA NA NA NA NA
## 13 3 AME Construction 2020 0.05949064 0.10281506 49.38 28.28 18.03
## 14 3 AME Construction 2021 0.04860212 0.08132782 45.93 19.77 21.34
## 15 3 AME Construction 2022 0.03472091 0.07033392 41.25 17.60 29.42
## 16 4 AZRB Construction 2018 0.00193393 0.01815637 58.28 1.00 11.12
## 17 4 AZRB Construction 2019 NA NA NA NA NA
## 18 4 AZRB Construction 2020 -0.02468318 -0.30850733 58.53 1.28 2.64
## 19 4 AZRB Construction 2021 -0.01720440 -0.25329091 58.54 0.45 0.92
## 20 4 AZRB Construction 2022 -0.01462379 -0.29832540 58.54 0.45 0.92
## firm_size gdp interest_rate
## 1 19.83497 4.8 3.25
## 2 19.83162 4.4 3.25
## 3 19.82645 -5.5 1.75
## 4 19.90689 3.1 1.75
## 5 NA NA NA
## 6 19.66730 4.8 3.25
## 7 19.68413 4.4 3.25
## 8 19.60096 -5.5 1.75
## 9 19.81164 3.1 1.75
## 10 NA NA NA
## 11 NA NA NA
## 12 NA NA NA
## 13 20.86638 -5.5 1.75
## 14 20.89876 3.1 1.75
## 15 21.13706 8.7 2.75
## 16 22.21408 4.8 3.25
## 17 NA NA NA
## 18 22.22873 -5.5 1.75
## 19 22.19641 3.1 1.75
## 20 22.23113 8.7 2.75
str(df) # Explore the structure of the data
## 'data.frame': 1660 obs. of 12 variables:
## $ id : int 1 1 1 1 1 2 2 2 2 2 ...
## $ firm_name : chr "ADVCON" "ADVCON" "ADVCON" "ADVCON" ...
## $ industry : chr "Construction" "Construction" "Construction" "Construction" ...
## $ year : int 2018 2019 2020 2021 2022 2018 2019 2020 2021 2022 ...
## $ roa : num 0.02581 0.0264 0.00515 0.00464 NA ...
## $ roe : num 0.05835 0.0564 0.0112 0.00951 NA ...
## $ fw : num 31.2 31.3 31 26.1 NA ...
## $ mw : num 25.1 23.6 19.7 16.5 NA ...
## $ in. : num 9.16 10.77 6.8 13.45 NA ...
## $ firm_size : num 19.8 19.8 19.8 19.9 NA ...
## $ gdp : num 4.8 4.4 -5.5 3.1 NA 4.8 4.4 -5.5 3.1 NA ...
## $ interest_rate: num 3.25 3.25 1.75 1.75 NA 3.25 3.25 1.75 1.75 NA ...
Handling of missing values
sum(is.na(df))
## [1] 1761
# If there are missing values, decide how to handle them.
# Here, we will remove rows with any missing values.
df <- df[complete.cases(df), ]
str(df)
## 'data.frame': 1439 obs. of 12 variables:
## $ id : int 1 1 1 1 2 2 2 2 3 3 ...
## $ firm_name : chr "ADVCON" "ADVCON" "ADVCON" "ADVCON" ...
## $ industry : chr "Construction" "Construction" "Construction" "Construction" ...
## $ year : int 2018 2019 2020 2021 2018 2019 2020 2021 2020 2021 ...
## $ roa : num 0.02581 0.0264 0.00515 0.00464 0.01575 ...
## $ roe : num 0.05835 0.0564 0.0112 0.00951 0.02792 ...
## $ fw : num 31.2 31.3 31 26.1 0 ...
## $ mw : num 25.1 23.6 19.7 16.5 10.6 ...
## $ in. : num 9.16 10.77 6.8 13.45 31.94 ...
## $ firm_size : num 19.8 19.8 19.8 19.9 19.7 ...
## $ gdp : num 4.8 4.4 -5.5 3.1 4.8 4.4 -5.5 3.1 -5.5 3.1 ...
## $ interest_rate: num 3.25 3.25 1.75 1.75 3.25 3.25 1.75 1.75 1.75 1.75 ...
Converting to panel data
pdata <- pdata.frame(df, index = c("firm_name","year"))#, drop.index = TRUE, row.names = TRUE)
head(pdata)
## id firm_name industry year roa roe fw mw
## AASIA-2018 150 AASIA Plantation 2018 -0.01763308 -0.02375980 0.00 27.14
## AASIA-2019 150 AASIA Plantation 2019 -0.03534037 -0.04892673 0.00 27.14
## AASIA-2020 150 AASIA Plantation 2020 -0.00661030 -0.00911783 27.14 0.00
## AASIA-2021 150 AASIA Plantation 2021 -0.02091779 -0.02851444 27.14 0.00
## ADVCON-2018 1 ADVCON Construction 2018 0.02581262 0.05834585 31.21 25.15
## ADVCON-2019 1 ADVCON Construction 2019 0.02640413 0.05640305 31.28 23.59
## in. firm_size gdp interest_rate
## AASIA-2018 45.89 19.77255 4.8 3.25
## AASIA-2019 42.28 19.74912 4.4 3.25
## AASIA-2020 42.31 19.73250 -5.5 1.75
## AASIA-2021 43.52 19.68868 3.1 1.75
## ADVCON-2018 9.16 19.83497 4.8 3.25
## ADVCON-2019 10.77 19.83162 4.4 3.25
Visualizing the data
# Plot log ROA over time for each firm
ggplot(pdata, aes(x=year, y=log(roa), group=firm_name, color=firm_name)) +
geom_line() +
theme(legend.position="none") +
labs(x="Year", y="Log of Return on Asset (ROA)",
title="ROA over time for each Firm")
# Plot log ROE over time for each firm
ggplot(pdata, aes(x=year, y=log(roe), group=firm_name, color=firm_name)) +
geom_line() +
theme(legend.position="none") +
labs(x="Year", y="Log of Return on Equity (ROE)",
title="ROE over time for each Firm")
ggplot(pdata, aes(x=in., y=log(roa))) +
geom_point() +
labs(x="Institutional Ownership (IN)", y="Return on Asset (ROA)",
title="Scatterplot of ROA vs IN")
ggplot(pdata, aes(x=mw, y=log(roa))) +
geom_point() +
labs(x="Managerial Ownership (MW)", y="Return on Asset (ROA)",
title="Scatterplot of ROA vs MW")
# Scatterplot of log of ROE vs FW, in and mw
ggplot(pdata, aes(x=fw, y= log(roe)))+
geom_point() +
labs(x="Family Ownership (FW)", y="Return on Asset (ROA)",
title="Scatterplot of ROE vs FW")
ggplot(pdata, aes(x=in., y=log(roe))) +
geom_point() +
labs(x="Institutional Ownership (IN)", y="Return on Asset (ROA)",
title="Scatterplot of ROE vs IN")
ggplot(pdata, aes(x=mw, y=log(roe))) +
geom_point() +
labs(x="Managerial Ownership (MW)", y="Return on Asset (ROA)",
title="Scatterplot of ROE vs MW")
Exploratory Data Analysis
# Get summary statistics of all variables
summary(pdata)
## id firm_name industry year
## Min. : 1.0 AEON : 5 Length:1439 2018:313
## 1st Qu.: 85.5 AHEALTH: 5 Class :character 2019:317
## Median :169.0 AJI : 5 Mode :character 2020:318
## Mean :167.8 AMTEL : 5 2021:324
## 3rd Qu.:249.0 AMWAY : 5 2022:167
## Max. :332.0 ANCOMNY: 5
## (Other):1409
## roa roe fw mw
## Min. :-19.714370 Min. :-16.091659 Min. : 0.00 Min. : 0.0000
## 1st Qu.: -0.000777 1st Qu.: 0.002388 1st Qu.: 0.00 1st Qu.: 0.0095
## Median : 0.026450 Median : 0.051010 Median : 0.00 Median : 0.5400
## Mean : 0.002515 Mean : 0.069756 Mean :19.93 Mean :12.1513
## 3rd Qu.: 0.070152 3rd Qu.: 0.119607 3rd Qu.:43.45 3rd Qu.:19.9000
## Max. : 2.705501 Max. : 17.080187 Max. :82.53 Max. :76.0000
##
## in. firm_size gdp interest_rate
## Min. : 0.00 Min. :11.88 Min. :-5.500 Min. :1.750
## 1st Qu.:10.80 1st Qu.:20.09 1st Qu.: 3.100 1st Qu.:1.750
## Median :23.75 Median :20.89 Median : 4.400 Median :2.750
## Mean :34.85 Mean :21.05 Mean : 2.506 Mean :2.523
## 3rd Qu.:59.62 3rd Qu.:21.93 3rd Qu.: 4.800 3rd Qu.:3.250
## Max. :99.35 Max. :25.35 Max. : 8.700 Max. :3.250
##
# Create a correlation matrix
cor(pdata[, c("roa", "roe", "fw", "in.", "mw")])
## roa roe fw in. mw
## roa 1.000000000 -0.301728825 0.038764004 0.07046826 0.002695281
## roe -0.301728825 1.000000000 -0.005253933 0.02654570 -0.021620679
## fw 0.038764004 -0.005253933 1.000000000 -0.48476231 -0.381820924
## in. 0.070468256 0.026545699 -0.484762310 1.00000000 -0.358544544
## mw 0.002695281 -0.021620679 -0.381820924 -0.35854454 1.000000000
# Visualize correlations
corr <- cor(pdata[, c("roa", "roe", "fw", "in.", "mw")])
# import library for corrplot function
library(corrplot)
## corrplot 0.92 loaded
corrplot(corr, method="circle")
# Plot the distributions of all variables
#histogram of roa
histogram(log(pdata$roa), xlab = "roa", ylab = "Frequency", main = "Histogram of roa")
#histogram of roe
histogram(log(pdata$roe), xlab = "roe", ylab = "Frequency", main = "Histogram of roe")
# Pairwise scatterplots of ROA, ROE, FW, IN, and MW
# pairs(pdata[, c("roa", "roe", "fw", "in.", "mw")])
# Or use GGally package for a better visualization
ggpairs(pdata[, c("roa", "roe", "fw", "in.", "mw")])
Lets now start analysing the data # reference: https://www.youtube.com/watch?v=T7Dmp8a7IU0&list=PL6Y8SvWdPo08HEFH0aysLYoYkcWxptKXs
# install packages for plm
install.packages("plm")
# Import the library
library(plm)
library(stargazer) # for making the model output tables looking good.
1) pooled OLS model ref: https://www.youtube.com/watch?v=e5RSQ1nkGq8
pooled_ols <- lm(roa ~ fw + in. + mw, data = pdata)
summary(pooled_ols)
##
## Call:
## lm(formula = roa ~ fw + in. + mw, data = pdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.3645 -0.0601 0.0118 0.0946 2.8113
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3733900 0.0640876 -5.826 6.99e-09 ***
## fw 0.0057648 0.0010607 5.435 6.43e-08 ***
## in. 0.0053939 0.0009061 5.953 3.31e-09 ***
## mw 0.0060114 0.0013238 4.541 6.07e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6534 on 1435 degrees of freedom
## Multiple R-squared: 0.02591, Adjusted R-squared: 0.02388
## F-statistic: 12.73 on 3 and 1435 DF, p-value: 3.286e-08
#stargazer(pooled_ols,type = "text")
p value of F-statistic is 0.000, which is less than 0.05. This means that the model is statistically significant. The coefficients of the independent variables are statistically significant, which means that the independent variables are statistically significant in explaining the variation in ROA. The coefficient of all ind vars are posotove, which means that the higher the family ownership, the higher the ROA.
R-squared is 0.023, which is very low. This means that the model explains only 2.3% of the variation in ROA. So we need to consider wither fixed effect model or random effect model.
2) Fixed Effect Models
2.1) LSDV ref: https://www.youtube.com/watch?v=2-r1lXztxRg
Least square Dummy variable (LSDV) or Fixed Effects Model (FEM) is a regression model that allows us to control for unobserved heterogeneity. ref: https://www.youtube.com/watch?v=Z3z3lXZ3X1E LSDV, in other words, fixed effect model using firm_name as dummy variable (firm_name is the panel variable)
LSDVmodel <- lm(roa ~ fw + in. + mw + factor(firm_name), data = pdata)
summary(LSDVmodel)
##
## Call:
## lm(formula = roa ~ fw + in. + mw + factor(firm_name), data = pdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.0445 -0.0310 -0.0007 0.0300 10.0243
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5556891 0.2725587 -2.039 0.041709 *
## fw 0.0096962 0.0025058 3.870 0.000115 ***
## in. 0.0077288 0.0024437 3.163 0.001605 **
## mw 0.0049952 0.0026382 1.893 0.058568 .
## factor(firm_name)ADVCON 0.0974543 0.3390194 0.287 0.773814
## factor(firm_name)ADVENTA 0.2096924 0.3418447 0.613 0.539729
## factor(firm_name)AEON -0.0650270 0.3263372 -0.199 0.842094
## factor(firm_name)AGES 0.3730198 0.3392749 1.099 0.271806
## factor(firm_name)AHEALTH 0.2034289 0.3256458 0.625 0.532301
## factor(firm_name)AIRPORT -0.0101867 0.3382454 -0.030 0.975980
## factor(firm_name)AJI 0.0721118 0.3219860 0.224 0.822830
## factor(firm_name)ALAM 0.0339671 0.3777887 0.090 0.928375
## factor(firm_name)AME -0.1246111 0.3663739 -0.340 0.733831
## factor(firm_name)AMEDIA -9.1337953 0.3495129 -26.133 < 2e-16 ***
## factor(firm_name)AMTEL 0.2368116 0.3196280 0.741 0.458913
## factor(firm_name)AMWAY -0.0220939 0.3303951 -0.067 0.946696
## factor(firm_name)ANCOMNY 0.3138578 0.3353189 0.936 0.349479
## factor(firm_name)ANNJOO -0.0546437 0.3473154 -0.157 0.875012
## factor(firm_name)ARMADA 0.0120843 0.3367789 0.036 0.971383
## factor(firm_name)ASTRO -0.0290686 0.3293420 -0.088 0.929684
## factor(firm_name)ATECH 0.1497352 0.5436588 0.275 0.783044
## factor(firm_name)ATLAN 0.2522221 0.3361025 0.750 0.453154
## factor(firm_name)AXIATA -0.1015940 0.3435797 -0.296 0.767520
## factor(firm_name)AYER 0.2884272 0.3520605 0.819 0.412818
## factor(firm_name)AZRB -0.0590317 0.3463012 -0.170 0.864677
## factor(firm_name)BARAKAH 0.1873164 0.3415248 0.548 0.583480
## factor(firm_name)BAT 0.1800594 0.3273055 0.550 0.582344
## factor(firm_name)BAUTO 0.2266108 0.3152032 0.719 0.472333
## factor(firm_name)BENALEC 0.1313494 0.3394310 0.387 0.698853
## factor(firm_name)BHIC -0.1938437 0.3413974 -0.568 0.570290
## factor(firm_name)BIPORT -0.1433298 0.3365914 -0.426 0.670317
## factor(firm_name)BJASSET 0.2125847 0.3196135 0.665 0.506106
## factor(firm_name)BJCORP -0.1978097 0.3260543 -0.607 0.544190
## factor(firm_name)BJFOOD 0.0194091 0.3219012 0.060 0.951931
## factor(firm_name)BJLAND 0.0412748 0.3197218 0.129 0.897305
## factor(firm_name)BKAWAN -0.1159175 0.3251299 -0.357 0.721514
## factor(firm_name)BPLANT -0.0248818 0.3395116 -0.073 0.941591
## factor(firm_name)BPURI 0.3938276 0.3488101 1.129 0.259118
## factor(firm_name)BSTEAD -0.0620392 0.3398748 -0.183 0.855196
## factor(firm_name)CANONE 0.0872433 0.3349027 0.261 0.794524
## factor(firm_name)CAPITALA 0.1114268 0.3368328 0.331 0.740853
## factor(firm_name)CAREPLS 0.3916445 0.3444617 1.137 0.255795
## factor(firm_name)CARIMIN 0.3585589 0.3316310 1.081 0.279845
## factor(firm_name)CARLSBG 0.3665303 0.3199265 1.146 0.252180
## factor(firm_name)CBIP 0.1358828 0.3409490 0.399 0.690307
## factor(firm_name)CDB 0.0680388 0.3448878 0.197 0.843646
## factor(firm_name)CEB -0.1546464 0.5289706 -0.292 0.770071
## factor(firm_name)CEPAT 0.1505038 0.3232163 0.466 0.641562
## factor(firm_name)CHGP 0.0487695 0.3432054 0.142 0.887027
## factor(firm_name)CHINHIN -0.0890327 0.3470896 -0.257 0.797603
## factor(firm_name)CHINTEK -0.0128238 0.3336370 -0.038 0.969347
## factor(firm_name)CITAGLB 0.1346080 0.3401952 0.396 0.692419
## factor(firm_name)CJCEN 0.4536883 0.3322266 1.366 0.172343
## factor(firm_name)CMSB -0.0156377 0.3394716 -0.046 0.963267
## factor(firm_name)COASTAL -0.1614263 0.3255548 -0.496 0.620099
## factor(firm_name)CRESBLD 0.0565987 0.3230534 0.175 0.860955
## factor(firm_name)CRESNDO -0.2065482 0.3384169 -0.610 0.541765
## factor(firm_name)CYPARK 0.1992193 0.3329599 0.598 0.549744
## factor(firm_name)DAYANG -0.0349462 0.3214563 -0.109 0.913451
## factor(firm_name)DBHD 0.0067921 0.3625490 0.019 0.985057
## factor(firm_name)DELEUM 0.2248649 0.3500904 0.642 0.520808
## factor(firm_name)DIALOG 0.2338109 0.3155760 0.741 0.458910
## factor(firm_name)DKLS -0.0315797 0.3442385 -0.092 0.926923
## factor(firm_name)DKSH -0.0904310 0.3448630 -0.262 0.793198
## factor(firm_name)DLADY 0.1969672 0.3417065 0.576 0.564447
## factor(firm_name)DPHARMA 0.0564334 0.3381241 0.167 0.867478
## factor(firm_name)DRBHCOM -0.1108574 0.3452322 -0.321 0.748188
## factor(firm_name)DUFU 0.4668180 0.3221949 1.449 0.147658
## factor(firm_name)DUTALND -0.0521299 0.3271865 -0.159 0.873440
## factor(firm_name)E&O -0.0580509 0.3182975 -0.182 0.855318
## factor(firm_name)EATECH -0.0590785 0.3320719 -0.178 0.858827
## factor(firm_name)ECOFIRS 0.2533255 0.3204063 0.791 0.429325
## factor(firm_name)ECOHLDS 0.3745834 0.3270670 1.145 0.252341
## factor(firm_name)ECONBHD 0.0627787 0.3179269 0.197 0.843502
## factor(firm_name)ECOWLD 0.1317373 0.3405393 0.387 0.698943
## factor(firm_name)EDGENTA -0.1112443 0.3475260 -0.320 0.748950
## factor(firm_name)EG 0.3901679 0.3277700 1.190 0.234157
## factor(firm_name)EKOVEST 0.1562373 0.3190902 0.490 0.624490
## factor(firm_name)EWINT -0.0370924 0.3177558 -0.117 0.907093
## factor(firm_name)F&N -0.0131600 0.3289833 -0.040 0.968099
## factor(firm_name)FAJAR 0.2782276 0.3203588 0.868 0.385316
## factor(firm_name)FAREAST 0.1013356 0.3413322 0.297 0.766612
## factor(firm_name)FFB -0.0619727 0.5261343 -0.118 0.906256
## factor(firm_name)FGV -0.1087410 0.3443729 -0.316 0.752240
## factor(firm_name)FM 0.0415222 0.3284861 0.126 0.899435
## factor(firm_name)FPI 0.3006060 0.3342673 0.899 0.368690
## factor(firm_name)GADANG 0.0889260 0.3213027 0.277 0.782011
## factor(firm_name)GAMUDA 0.0936187 0.3161893 0.296 0.767221
## factor(firm_name)GBGAQRS 0.1967219 0.3347421 0.588 0.556866
## factor(firm_name)GCAP 0.4029053 0.3465074 1.163 0.245178
## factor(firm_name)GCB 0.0344770 0.3316788 0.104 0.917230
## factor(firm_name)GDB -0.0236232 0.3448004 -0.069 0.945390
## factor(firm_name)GDEX 0.0418194 0.3363945 0.124 0.901087
## factor(firm_name)GENM -0.0502282 0.3240529 -0.155 0.876850
## factor(firm_name)GENP -0.1983785 0.3238733 -0.613 0.540321
## factor(firm_name)GENTING -0.0271018 0.3205934 -0.085 0.932645
## factor(firm_name)GKENT 0.0802634 0.3214871 0.250 0.802895
## factor(firm_name)GLBHD -0.0735880 0.3225704 -0.228 0.819587
## factor(firm_name)GLOMAC -0.0549544 0.3206424 -0.171 0.863950
## factor(firm_name)GOPENG 0.1302003 0.3350318 0.389 0.697632
## factor(firm_name)GPACKET 0.0581385 0.3379393 0.172 0.863439
## factor(firm_name)GUOCO -0.0320959 0.3242686 -0.099 0.921173
## factor(firm_name)HANDAL 0.1730918 0.3742478 0.463 0.643810
## factor(firm_name)HAPSENG -0.0857762 0.3478481 -0.247 0.805271
## factor(firm_name)HARBOUR 0.2318007 0.3491930 0.664 0.506945
## factor(firm_name)HARISON -0.0043000 0.3400597 -0.013 0.989913
## factor(firm_name)HARNLEN -0.1730737 0.3489305 -0.496 0.619985
## factor(firm_name)HARTA 0.2199680 0.3223716 0.682 0.495165
## factor(firm_name)HCK -0.2399806 0.3439919 -0.698 0.485553
## factor(firm_name)HEIM 0.2555043 0.3214149 0.795 0.426822
## factor(firm_name)HENGYUAN 0.1090137 0.3354349 0.325 0.745249
## factor(firm_name)HEXCARE 0.3859068 0.3442636 1.121 0.262547
## factor(firm_name)HEXTAR -0.1153859 0.3313230 -0.348 0.727713
## factor(firm_name)HEXTECH 0.2638125 0.3227560 0.817 0.413891
## factor(firm_name)HIAPTEK -0.0029397 0.3155248 -0.009 0.992568
## factor(firm_name)HIBISCS 0.3124999 0.3355815 0.931 0.351944
## factor(firm_name)HLIND 0.2383868 0.3530883 0.675 0.499723
## factor(firm_name)HOHUP 0.0144195 0.3360305 0.043 0.965780
## factor(firm_name)HSPLANT -0.0812070 0.3456141 -0.235 0.814280
## factor(firm_name)HUBLINE 0.0310335 0.3180941 0.098 0.922299
## factor(firm_name)HUMEIND -0.1151018 0.3264983 -0.353 0.724505
## factor(firm_name)IBHD 0.2354254 0.3662147 0.643 0.520447
## factor(firm_name)IBRACO 0.0869361 0.3630886 0.239 0.810813
## factor(firm_name)ICON 0.0818924 0.3533257 0.232 0.816755
## factor(firm_name)IDEAL -0.0775358 0.3476820 -0.223 0.823571
## factor(firm_name)IGBB -0.1573168 0.3358592 -0.468 0.639590
## factor(firm_name)IHB 0.2567727 0.3484711 0.737 0.461367
## factor(firm_name)IHH -0.1515971 0.3455416 -0.439 0.660948
## factor(firm_name)IJM 0.0567819 0.3199838 0.177 0.859186
## factor(firm_name)ILB 0.1816469 0.3179277 0.571 0.567881
## factor(firm_name)INCKEN 0.1637067 0.3350311 0.489 0.625200
## factor(firm_name)INNO -0.0098224 0.3430533 -0.029 0.977163
## factor(firm_name)INTA 0.2416363 0.3696742 0.654 0.513476
## factor(firm_name)IOICORP -0.1058284 0.3219263 -0.329 0.742418
## factor(firm_name)IOIPG -0.2299686 0.3305063 -0.696 0.486696
## factor(firm_name)IREKA -0.1932967 0.3162792 -0.611 0.541220
## factor(firm_name)IWCITY 0.1293754 0.3394472 0.381 0.703176
## factor(firm_name)JAKS 0.3931377 0.3465251 1.135 0.256825
## factor(firm_name)JKGLAND 0.2431388 0.3284468 0.740 0.459295
## factor(firm_name)JTIASA 0.2712633 0.3211391 0.845 0.398466
## factor(firm_name)KAB 0.3300869 0.3733901 0.884 0.376874
## factor(firm_name)KAREX -0.0925578 0.3190260 -0.290 0.771774
## factor(firm_name)KAWAN -0.0001585 0.3355017 0.000 0.999623
## factor(firm_name)KERJAYA -0.1755599 0.3508978 -0.500 0.616952
## factor(firm_name)KFIMA -0.1794054 0.3283435 -0.546 0.584905
## factor(firm_name)KGB 0.2018071 0.3326635 0.607 0.544214
## factor(firm_name)KIMLUN -0.0197676 0.3431659 -0.058 0.954075
## factor(firm_name)KLK -0.1234680 0.3207509 -0.385 0.700360
## factor(firm_name)KLUANG 0.1476532 0.3259623 0.453 0.650655
## factor(firm_name)KMLOONG -0.0639642 0.3321210 -0.193 0.847313
## factor(firm_name)KNM 0.2704448 0.3445541 0.785 0.432673
## factor(firm_name)KOBAY 0.0230523 0.3162765 0.073 0.941910
## factor(firm_name)KOSSAN 0.1550211 0.3381449 0.458 0.646722
## factor(firm_name)KOTRA -0.0030793 0.3334060 -0.009 0.992633
## factor(firm_name)KPJ -0.0515614 0.3428553 -0.150 0.880486
## factor(firm_name)KPPROP 0.0043881 0.3261401 0.013 0.989268
## factor(firm_name)KRETAM -0.1750397 0.3479836 -0.503 0.615057
## factor(firm_name)KSENG -0.2598599 0.3409491 -0.762 0.446123
## factor(firm_name)KSL -0.0868783 0.3450652 -0.252 0.801263
## factor(firm_name)L&G 0.3591196 0.3315530 1.083 0.278981
## factor(firm_name)LAGENDA 0.0046134 0.3447252 0.013 0.989325
## factor(firm_name)LBS -0.0498340 0.3355218 -0.149 0.881954
## factor(firm_name)LCTITAN -0.0879945 0.3295731 -0.267 0.789522
## factor(firm_name)LEBTECH -0.0566852 0.3426256 -0.165 0.868625
## factor(firm_name)LHI -0.1691722 0.3697700 -0.458 0.647397
## factor(firm_name)LITRAK -0.0065080 0.3284936 -0.020 0.984197
## factor(firm_name)LUXCHEM 0.1674196 0.3270735 0.512 0.608843
## factor(firm_name)M&G 0.2690328 0.3488490 0.771 0.440753
## factor(firm_name)MAGNI 0.0948669 0.3210977 0.295 0.767709
## factor(firm_name)MAGNUM -0.0110401 0.3339356 -0.033 0.973632
## factor(firm_name)MAHSING -0.0687604 0.3336125 -0.206 0.836744
## factor(firm_name)MALPAC 0.1653923 0.3402329 0.486 0.626982
## factor(firm_name)MATRIX 0.2359685 0.3218170 0.733 0.463569
## factor(firm_name)MAXIM 0.2472272 0.3253230 0.760 0.447450
## factor(firm_name)MAXIS -0.1385781 0.3371959 -0.411 0.681173
## factor(firm_name)MAYBULK 0.0635018 0.3360333 0.189 0.850147
## factor(firm_name)MBMR 0.0350320 0.3407254 0.103 0.918128
## factor(firm_name)MCEMENT -0.1821856 0.3479224 -0.524 0.600635
## factor(firm_name)MCT 0.0304953 0.3384778 0.090 0.928228
## factor(firm_name)MEDIA -0.0466894 0.3395361 -0.138 0.890653
## factor(firm_name)MEDIAC -0.3082837 0.3387951 -0.910 0.363052
## factor(firm_name)MELATI -0.0982738 0.3274624 -0.300 0.764152
## factor(firm_name)MENANG 0.3564189 0.3256473 1.094 0.273977
## factor(firm_name)MERCURY 0.2297607 0.3426761 0.670 0.502686
## factor(firm_name)MFLOUR 0.2657324 0.3195370 0.832 0.405805
## factor(firm_name)MGB -0.2048981 0.3457371 -0.593 0.553543
## factor(firm_name)MHB -0.1630702 0.3289029 -0.496 0.620134
## factor(firm_name)MHC 0.0719758 0.3407515 0.211 0.832750
## factor(firm_name)MIECO 0.2406350 0.3578561 0.672 0.501447
## factor(firm_name)MISC -0.0858690 0.3292622 -0.261 0.794301
## factor(firm_name)MITRA 0.2527925 0.3478883 0.727 0.467595
## factor(firm_name)MKH 0.0113790 0.3219959 0.035 0.971816
## factor(firm_name)MPHBCAP -0.0303755 0.3169744 -0.096 0.923673
## factor(firm_name)MRCB -0.0099834 0.3367052 -0.030 0.976351
## factor(firm_name)MRDIY 0.1148175 0.4080639 0.281 0.778478
## factor(firm_name)MSC 0.1454994 0.3353471 0.434 0.664463
## factor(firm_name)MUDAJYA -0.0867985 0.3598241 -0.241 0.809426
## factor(firm_name)MUHIBAH 0.1310335 0.3328217 0.394 0.693875
## factor(firm_name)MULPHA 0.1173143 0.3429513 0.342 0.732361
## factor(firm_name)NAIM 0.2146947 0.3602563 0.596 0.551331
## factor(firm_name)NCT 0.0027347 0.3385339 0.008 0.993556
## factor(firm_name)NGGB 0.2011731 0.3370677 0.597 0.550741
## factor(firm_name)NOVA -0.0346349 0.3387698 -0.102 0.918587
## factor(firm_name)NPC -0.1432067 0.3437257 -0.417 0.677029
## factor(firm_name)NSOP -0.1201050 0.3416693 -0.352 0.725262
## factor(firm_name)OCK 0.1528398 0.3364334 0.454 0.649706
## factor(firm_name)OCR 0.1668415 0.3346947 0.498 0.618239
## factor(firm_name)OIB -0.0974236 0.3178027 -0.307 0.759241
## factor(firm_name)OMH 0.6667623 0.4140147 1.610 0.107579
## factor(firm_name)OPTIMAX 0.0840247 0.4094874 0.205 0.837458
## factor(firm_name)ORIENT -0.1687461 0.3430145 -0.492 0.622852
## factor(firm_name)OSK -0.0331451 0.3288673 -0.101 0.919739
## factor(firm_name)PADINI -0.0208166 0.3201965 -0.065 0.948176
## factor(firm_name)PANAMY 0.0759172 0.3230879 0.235 0.814273
## factor(firm_name)PANTECH 0.3083164 0.3266760 0.944 0.345479
## factor(firm_name)PARAMON 0.2333654 0.3330740 0.701 0.483674
## factor(firm_name)PCHEM -0.0991671 0.3379113 -0.293 0.769217
## factor(firm_name)PDZ 0.2278055 0.3439047 0.662 0.507848
## factor(firm_name)PEB 0.7179884 0.3425023 2.096 0.036283 *
## factor(firm_name)PECCA -0.0046026 0.3271042 -0.014 0.988776
## factor(firm_name)PENERGY 0.1860694 0.3374057 0.551 0.581423
## factor(firm_name)PERDANA -0.0847546 0.3218198 -0.263 0.792322
## factor(firm_name)PERSTIM 0.1555901 0.3173152 0.490 0.623996
## factor(firm_name)PERTAMA 0.0020901 0.3525106 0.006 0.995270
## factor(firm_name)PESONA -0.0850687 0.3438697 -0.247 0.804655
## factor(firm_name)PETDAG -0.1097822 0.3511086 -0.313 0.754588
## factor(firm_name)PETRONM 0.0014947 0.3403317 0.004 0.996497
## factor(firm_name)PGB 0.8588050 0.3462274 2.480 0.013269 *
## factor(firm_name)PHARMA -0.0091402 0.3388151 -0.027 0.978483
## factor(firm_name)PIE 0.1212183 0.3196180 0.379 0.704568
## factor(firm_name)PINEPAC 0.2538959 0.3267765 0.777 0.437342
## factor(firm_name)PLENITU -0.0740445 0.3268009 -0.227 0.820797
## factor(firm_name)PLS -0.1411139 0.3328537 -0.424 0.671684
## factor(firm_name)PMBTECH 0.0502761 0.3471432 0.145 0.884873
## factor(firm_name)PMETAL -0.0009062 0.3461034 -0.003 0.997911
## factor(firm_name)POS -0.0620579 0.3380893 -0.184 0.854396
## factor(firm_name)PPB -0.0147357 0.3256269 -0.045 0.963914
## factor(firm_name)PRKCORP -0.0789638 0.3363004 -0.235 0.814406
## factor(firm_name)PRTASCO 0.3443953 0.3471398 0.992 0.321369
## factor(firm_name)PTARAS -0.0786420 0.3264400 -0.241 0.809671
## factor(firm_name)PTRANS 0.1406397 0.3363024 0.418 0.675886
## factor(firm_name)PUNCAK 0.1052913 0.3414885 0.308 0.757889
## factor(firm_name)PWROOT 0.1775053 0.3384932 0.524 0.600107
## factor(firm_name)QL -0.0723238 0.3249346 -0.223 0.823904
## factor(firm_name)RAPID 0.1233792 0.3355398 0.368 0.713165
## factor(firm_name)REACH 0.2364759 0.3398224 0.696 0.486650
## factor(firm_name)RL 0.3777041 0.4249458 0.889 0.374289
## factor(firm_name)RSAWIT -0.0932133 0.3331059 -0.280 0.779660
## factor(firm_name)RVIEW -0.0265187 0.3471467 -0.076 0.939122
## factor(firm_name)SAB 0.1758880 0.3283143 0.536 0.592253
## factor(firm_name)SAM -0.0231697 0.3272163 -0.071 0.943563
## factor(firm_name)SAPNRG 0.0417466 0.3317238 0.126 0.899876
## factor(firm_name)SASBADI 0.1692483 0.3250532 0.521 0.602695
## factor(firm_name)SBAGAN 0.2221462 0.3393477 0.655 0.512844
## factor(firm_name)SCIENTX 0.0279029 0.3275810 0.085 0.932135
## factor(firm_name)SCIPACK -0.0678264 0.3474608 -0.195 0.845268
## factor(firm_name)SCOMIES -0.1450265 0.3179556 -0.456 0.648392
## factor(firm_name)SEALINK -0.7909476 0.3448704 -2.293 0.022009 *
## factor(firm_name)SEEHUP -0.0573989 0.3194707 -0.180 0.857446
## factor(firm_name)SEG 0.2075645 0.3599177 0.577 0.564260
## factor(firm_name)SEM 0.0424803 0.3323790 0.128 0.898325
## factor(firm_name)SENDAI -0.1909573 0.3767477 -0.507 0.612356
## factor(firm_name)SENHENG -0.1188191 0.5376512 -0.221 0.825136
## factor(firm_name)SERBADK 0.1833858 0.3407287 0.538 0.590536
## factor(firm_name)SHANG -0.1651008 0.3478465 -0.475 0.635140
## factor(firm_name)SHCHAN 0.0608035 0.3307218 0.184 0.854164
## factor(firm_name)SHL -0.0077823 0.3188956 -0.024 0.980535
## factor(firm_name)SIGN 0.1804571 0.3452426 0.523 0.601290
## factor(firm_name)SIME -0.1549952 0.3366667 -0.460 0.645333
## factor(firm_name)SIMEPLT -0.1936685 0.3528342 -0.549 0.583189
## factor(firm_name)SIMEPROP -0.1875365 0.3357237 -0.559 0.576546
## factor(firm_name)SJC 0.0279105 0.3384896 0.082 0.934299
## factor(firm_name)SKPRES 0.0555577 0.3181523 0.175 0.861405
## factor(firm_name)SLVEST 0.2260707 0.3762444 0.601 0.548056
## factor(firm_name)SOP -0.2286363 0.3399577 -0.673 0.501379
## factor(firm_name)SPSETIA -0.1008547 0.3285731 -0.307 0.758941
## factor(firm_name)SPTOTO 0.2344344 0.3183068 0.737 0.461580
## factor(firm_name)STAR 0.0248109 0.3358334 0.074 0.941120
## factor(firm_name)STELLA 0.1659410 0.3295691 0.504 0.614707
## factor(firm_name)SUNSURIA 0.2345314 0.3430922 0.684 0.494383
## factor(firm_name)SUNWAY -0.1699647 0.3456384 -0.492 0.623000
## factor(firm_name)SUPERMX 0.4792886 0.3219165 1.489 0.136810
## factor(firm_name)SURIA 0.0919282 0.3360488 0.274 0.784477
## factor(firm_name)SWIFT 0.1019743 0.5257949 0.194 0.846256
## factor(firm_name)SWKPLNT 0.0905694 0.3165914 0.286 0.774873
## factor(firm_name)SYCAL 0.3851349 0.3436469 1.121 0.262647
## factor(firm_name)SYMLIFE 0.2363062 0.3194264 0.740 0.459589
## factor(firm_name)SYSCORP 0.1431749 0.3188234 0.449 0.653467
## factor(firm_name)T7GLOBAL 0.1511909 0.3331028 0.454 0.650000
## factor(firm_name)TAANN 0.2796451 0.3433016 0.815 0.415491
## factor(firm_name)TAMBUN 0.0955435 0.3413352 0.280 0.779598
## factor(firm_name)TANCO -0.0100290 0.3295208 -0.030 0.975725
## factor(firm_name)TAS 0.0220619 0.3297694 0.067 0.946673
## factor(firm_name)TASCO 0.0014119 0.3186936 0.004 0.996466
## factor(firm_name)TCHONG -0.0889512 0.3383712 -0.263 0.792692
## factor(firm_name)TDM 0.0055248 0.3362322 0.016 0.986893
## factor(firm_name)TECHNAX -0.0269684 0.3426715 -0.079 0.937285
## factor(firm_name)TGUAN 0.1720703 0.3338955 0.515 0.606417
## factor(firm_name)THPLANT -0.1389383 0.3417949 -0.406 0.684457
## factor(firm_name)TIMECOM 0.1440188 0.3328695 0.433 0.665347
## factor(firm_name)TITIJYA -0.1646808 0.3270712 -0.504 0.614712
## factor(firm_name)TJSETIA 0.0345012 0.5320126 0.065 0.948305
## factor(firm_name)TM -0.1726447 0.3519950 -0.490 0.623895
## factor(firm_name)TMCLIFE 0.4844114 0.3331093 1.454 0.146172
## factor(firm_name)TNLOGIS -0.2653452 0.3249785 -0.817 0.414390
## factor(firm_name)TOCEAN 0.1599717 0.3531054 0.453 0.650607
## factor(firm_name)TONGHER -0.1134523 0.3385692 -0.335 0.737618
## factor(firm_name)TOPBLDS -0.3700394 0.3749039 -0.987 0.323847
## factor(firm_name)TOPGLOV 0.2242428 0.3172340 0.707 0.479797
## factor(firm_name)TRC 0.2861956 0.3457090 0.828 0.407934
## factor(firm_name)TROP -0.1269268 0.3479890 -0.365 0.715373
## factor(firm_name)TSH 0.0781168 0.3158243 0.247 0.804689
## factor(firm_name)TSRCAP -0.0071103 0.3164992 -0.022 0.982081
## factor(firm_name)UCHITEC 0.5062668 0.3155580 1.604 0.108922
## factor(firm_name)UEMS -0.1170045 0.3703060 -0.316 0.752087
## factor(firm_name)UMCCA 0.0411193 0.3151279 0.130 0.896207
## factor(firm_name)UMW -0.1517472 0.3364156 -0.451 0.652027
## factor(firm_name)UOADEV -0.2444829 0.3344984 -0.731 0.464999
## factor(firm_name)UTDPLT -0.0837716 0.3385996 -0.247 0.804640
## factor(firm_name)UZMA 0.2408154 0.3237849 0.744 0.457185
## factor(firm_name)VELESTO -0.0224028 0.3375316 -0.066 0.947093
## factor(firm_name)VIZIONE 0.2990642 0.3375842 0.886 0.375867
## factor(firm_name)VS 0.1314993 0.3165343 0.415 0.677905
## factor(firm_name)WASEONG -0.1371760 0.3357169 -0.409 0.682908
## factor(firm_name)WCEHB 0.2675849 0.3208150 0.834 0.404417
## factor(firm_name)WCT 0.2150264 0.3350037 0.642 0.521095
## factor(firm_name)WELLCAL 0.3753200 0.3150602 1.191 0.233806
## factor(firm_name)WPRTS -0.1438211 0.3235139 -0.445 0.656726
## factor(firm_name)XINHWA -0.1285354 0.3313201 -0.388 0.698128
## factor(firm_name)YINSON -0.2242994 0.3231460 -0.694 0.487758
## factor(firm_name)YNHPROP -0.2234782 0.3443674 -0.649 0.516504
## factor(firm_name)YSPSAH 0.0533922 0.3432001 0.156 0.876399
## factor(firm_name)ZECON 0.0298014 0.3407513 0.087 0.930323
## factor(firm_name)ZELAN 0.2353086 0.3360853 0.700 0.483984
## factor(firm_name)ZHULIAN 0.4266850 0.3239843 1.317 0.188114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4672 on 1104 degrees of freedom
## Multiple R-squared: 0.6168, Adjusted R-squared: 0.5009
## F-statistic: 5.32 on 334 and 1104 DF, p-value: < 2.2e-16
#stargazer(LSDVmodel,type = "text")
P value of F statistics is 0.000, which means that the regression as a whole is statistically significant. R-squared is 0.5009, which is quite healthy. This means that the model explains only 50% of the variation in ROA. The coefficient of a few dummy variables are statistically significant, which means that dummy variables are somewhat important in explaining the variation in ROA.
NOW, which one is better? Pooled OLS or LSDV?
- restricted F test for fixed effects or partial F test between pooled OLS and fixed effects model # https://www.youtube.com/watch?v=VcxE-FlKQ3s
anova(pooled_ols, LSDVmodel, test = "F")
## Analysis of Variance Table
##
## Model 1: roa ~ fw + in. + mw
## Model 2: roa ~ fw + in. + mw + factor(firm_name)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1435 612.56
## 2 1104 240.98 331 371.58 5.1431 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p value of F statistics is 0.000, which means that the regression as a whole is statistically significant. This means that the LSDV fixed effects model is better than the pooled OLS model.
- Wald test for fixed effects or partial F test between pooled OLS and fixed effects model https://www.youtube.com/watch?v=VcxE-FlKQ3s
# import library for waldtest
library(lmtest)
waldtest(pooled_ols, LSDVmodel)
## Wald test
##
## Model 1: roa ~ fw + in. + mw
## Model 2: roa ~ fw + in. + mw + factor(firm_name)
## Res.Df Df F Pr(>F)
## 1 1435
## 2 1104 331 5.1431 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p value of F statistics is 0.000, which means that the regression as a whole is statistically significant. This means that the LSDV fixed effects model is better than the pooled OLS model.
2.2) Within Group Fixed Effects Model ref: https://www.youtube.com/watch?v=eyBV33ll92A&list=PL6Y8SvWdPo08HEFH0aysLYoYkcWxptKXs&index=6
within_group_fe <- plm(roa ~ fw + in. + mw, data = pdata, model = "within")
summary(within_group_fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = roa ~ fw + in. + mw, data = pdata, model = "within")
##
## Unbalanced Panel: n = 332, T = 1-5, N = 1439
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.0044e+01 -3.0985e-02 -6.6246e-04 2.9988e-02 1.0024e+01
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## fw 0.0096962 0.0025058 3.8696 0.0001154 ***
## in. 0.0077288 0.0024437 3.1628 0.0016055 **
## mw 0.0049952 0.0026382 1.8934 0.0585684 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 244.77
## Residual Sum of Squares: 240.98
## R-Squared: 0.015498
## Adj. R-Squared: -0.28235
## F-statistic: 5.79306 on 3 and 1104 DF, p-value: 0.00062733
#stargazer(within_group_fe,type = "text")
p value of F statistics is 0.000, which means that the regression as a whole is statistically significant.
2.3) first difference model
# first difference model
first_dif <- plm(roa ~ fw + in. + mw, data = pdata, model = "fd")
summary(first_dif)
## Oneway (individual) effect First-Difference Model
##
## Call:
## plm(formula = roa ~ fw + in. + mw, data = pdata, model = "fd")
##
## Unbalanced Panel: n = 332, T = 1-5, N = 1439
## Observations used in estimation: 1107
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -4.858507 -0.044631 -0.020302 0.011278 12.593283
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 0.02027311 0.01776377 1.1413 0.2540
## fw -0.00167071 0.00297411 -0.5617 0.5744
## in. 0.00198926 0.00300025 0.6630 0.5074
## mw -0.00029842 0.00307485 -0.0971 0.9227
##
## Total Sum of Squares: 374.37
## Residual Sum of Squares: 373.88
## R-Squared: 0.0012896
## Adj. R-Squared: -0.0014267
## F-statistic: 0.474757 on 3 and 1103 DF, p-value: 0.69992
#stargazer(first_dif,type = "text")
p value of F statistics is not significant which means that the regression as a whole is not statistically significant. R-squared is 0.001, which is very low. This means that the model explains only 0.1% of the variation in ROA.
2.4 between model — However, we are not using this in this analysis
betweenModel <- plm(roa ~ fw + in. + mw, data = pdata, model = "between")
summary(betweenModel)
## Oneway (individual) effect Between Model
##
## Call:
## plm(formula = roa ~ fw + in. + mw, data = pdata, model = "between")
##
## Unbalanced Panel: n = 332, T = 1-5, N = 1439
## Observations used in estimation: 332
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -9.249834 -0.056378 0.014939 0.100713 0.727809
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -0.4068493 0.1159199 -3.5097 0.0005112 ***
## fw 0.0060503 0.0018965 3.1902 0.0015589 **
## in. 0.0058468 0.0016280 3.5914 0.0003793 ***
## mw 0.0066495 0.0023906 2.7815 0.0057234 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 96.25
## Residual Sum of Squares: 92.433
## R-Squared: 0.039662
## Adj. R-Squared: 0.030878
## F-statistic: 4.51542 on 3 and 328 DF, p-value: 0.0040418
#stargazer(betweenModel,type = "text")
p value of F statistics is 0.000, which means that the regression as a whole is statistically significant. R-squared is 0.000, which is very low. This means that the model explains only 0.0% of the variation in ROA.
3) Fit the random effects model
random_effects <- plm(roa ~ fw + in. + mw, data = pdata, model = "random")
summary(random_effects,type = "text")
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = roa ~ fw + in. + mw, data = pdata, model = "random")
##
## Unbalanced Panel: n = 332, T = 1-5, N = 1439
##
## Effects:
## var std.dev share
## idiosyncratic 0.2183 0.4672 0.509
## individual 0.2104 0.4587 0.491
## theta:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2864 0.5462 0.5855 0.5634 0.5855 0.5855
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -14.2678 -0.0397 0.0059 0.0007 0.0568 5.8540
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) -0.4243450 0.0865688 -4.9018 9.495e-07 ***
## fw 0.0068812 0.0014546 4.7306 2.239e-06 ***
## in. 0.0060793 0.0012516 4.8570 1.192e-06 ***
## mw 0.0061453 0.0017233 3.5661 0.0003624 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 323.96
## Residual Sum of Squares: 317.83
## R-Squared: 0.018925
## Adj. R-Squared: 0.016874
## Chisq: 27.6777 on 3 DF, p-value: 4.2442e-06
#stargazer(random_effects,type = "text")
p value of F statistics is 0.000, which means that the regression as a whole is statistically significant. R-squared is 0.016, which is very low. This means that the model explains only 1.6% of the variation in ROA.
NOW, lets do Hausman test to decide between fixed effects and random effects
# Perform the Hausman test
hausman_test <- phtest(random_effects, within_group_fe)
print(hausman_test)
##
## Hausman Test
##
## data: roa ~ fw + in. + mw
## chisq = 3.3054, df = 3, p-value = 0.3469
## alternative hypothesis: one model is inconsistent
p value of the test is not significant, which means that we cannot reject the null hypothesis that the random effects model is better than the fixed effects model. So, we can use the random effects model.
NOW, which one is better: pooled OLS OR random effects model? Lets do breusch pagan test for comparison between pooled OLS and random effects model ref: https://www.youtube.com/watch?v=VcxE-FlKQ3s
# Perform the Breusch-Pagan test
bp_test <- plmtest(random_effects, type=c("bp"))
# Print the result
print(bp_test)
##
## Lagrange Multiplier Test - (Breusch-Pagan)
##
## data: roa ~ fw + in. + mw
## chisq = 426.48, df = 1, p-value < 2.2e-16
## alternative hypothesis: significant effects
p value of the test is significant, which means that we can reject the null hypothesis that the pooled OLS model is better than the random effects model. So, we can use the random effects model.
However lets consider the lagged dependent variable model (another fixed effect model) as well. 2.5) Fixed Effect Model with lagged dependent variable
fixed_effects_lag <- plm(roa ~ fw + in. + mw + lag(roa), data = pdata, model = "within")
summary(fixed_effects_lag)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = roa ~ fw + in. + mw + lag(roa), data = pdata, model = "within")
##
## Unbalanced Panel: n = 326, T = 1-4, N = 1096
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.27284288 -0.02573549 -0.00020489 0.02472433 1.22766132
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## fw 0.0027842 0.0011914 2.3368 0.0197043 *
## in. 0.0035291 0.0011495 3.0701 0.0022155 **
## mw 0.0043953 0.0011652 3.7723 0.0001743 ***
## lag(roa) 0.7769626 0.0253907 30.6002 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 44.418
## Residual Sum of Squares: 19.235
## R-Squared: 0.56695
## Adj. R-Squared: 0.38095
## F-statistic: 250.712 on 4 and 766 DF, p-value: < 2.22e-16
#stargazer(fixed_effects_lag,type = "text")
p value of F statistics is 0.000, which means that the regression as a whole is statistically significant. R-squared is 0.5009, which is quite healthy. This means that the model explains only 50% of the variation in ROA. The coefficient of lagged dependent variable is statistically significant, which means that lagged dependent variable is somewhat important in explaining the variation in ROA.
NOW, is this better than Random effect model?
hausman_test <- phtest(random_effects, fixed_effects_lag)
print(hausman_test)
##
## Hausman Test
##
## data: roa ~ fw + in. + mw
## chisq = 41.536, df = 3, p-value = 5.032e-09
## alternative hypothesis: one model is inconsistent
p value of the test is significant, which means that we can reject the null hypothesis that the random effects model is better than the fixed effects model. So, we cannot use the random effects model. Thats INTERESTING !! So lets use the fixed effects model with lagged dependent variable.
Some relevant Questions and answers:
Question: What is the purpose of Hausman and Breusch-Pagan LM test in panel data analysis and how to interpret the result?
Answer The Hausman test and the Breusch-Pagan LM test are essential for panel data analysis as they help you decide which model is the most appropriate for your data.
- Hausman Test: This test is used to decide between a Fixed Effects Model and a Random Effects Model. The null hypothesis is that the preferred model is the Random Effects, while the alternative is the Fixed Effects. If the p-value is less than the significance level (e.g., 0.05), you reject the null hypothesis, which means the Fixed Effects Model is more appropriate. If the p-value is more significant, you fail to reject the null hypothesis, which implies the Random Effects Model is more suitable.
- Breusch-Pagan LM test: This test is used to determine whether a random effects model or a simple OLS regression would be more appropriate. The null hypothesis is that variances across entities are equal. Hence, the simple OLS regression would be more appropriate. If the p-value is less than the significance level (e.g., 0.05), you reject the null hypothesis, meaning the variances across entities are not equal, and the random effects model would be more suitable. If the p-value is larger, you fail to reject the null hypothesis, which implies that the variances are equal across entities and OLS regression would be more appropriate.
Question: Whats the diff between Wald test and partial F test between Pooled OLS and LSDV?
Answer The Wald Test and the Partial F Test are statistical tests used in econometrics to test restrictions on the parameters of a model. They are frequently used in the context of panel data analysis, especially when comparing different types of models such as the Pooled Ordinary Least Squares (Pooled OLS) and Least Squares Dummy Variable (LSDV, also known as Fixed Effects) models.
- Wald Test: The Wald test is used to test the significance of individual coefficients in the regression model or to test sub-models. In the context of panel data, the Wald test is often used to test whether the coefficients of time or entity-specific variables in the LSDV model are all zero (which would suggest that a Pooled OLS model might be appropriate). If the null hypothesis is rejected, this would indicate that using the Pooled OLS model would be inappropriate, and a more complex model (like the LSDV or Fixed Effects model) is needed.
- Partial F Test: The partial F test is often used to compare nested models: a “full” model with all predictors and a “reduced” model where some of the predictors of the full model have been removed. In the context of panel data, you could use the F test to test whether all firm-specific dummy variables in the LSDV model are jointly zero. If the null hypothesis is rejected, this would suggest that the Fixed Effects model is more appropriate than the Pooled OLS model.
Both of these tests provide a means to test whether a more complex model (like LSDV/Fixed Effects) is justified given the data, or whether a simpler model (like Pooled OLS) would suffice.
In R, these tests can be conducted using the waldtest()
and lrtest()
functions from the plm
package for the Wald Test and the Partial F Test, respectively. You can use these functions to compare different plm
models. Keep in mind that these tests rely on certain assumptions (like homoscedasticity and no autocorrelation), so it’s important to conduct diagnostic checks as well.
Finally, as always, it’s essential to interpret these tests in the context of your data and your research question. While they provide valuable guidance, the choice of model should not rely solely on these tests, but also on your understanding of the underlying processes that generated your data.
Question: Is there any difference between Pooled OLS and plm(roa ~ fw + in. + mw, data = pdata, model = “random”)?
Answer Yes, there is a difference between the Pooled Ordinary Least Squares (OLS) and the Random Effects model implemented with the plm
function.
- Pooled OLS: This is the basic least squares regression model which ignores the panel structure of the data. It assumes that each observation is independent of the others, and it does not account for any possible correlation between the error term of a specific entity (a firm in your case) and its characteristics across different time periods. In other words, it treats the data as a simple cross-section.
- Random Effects Model (
plm
withmodel = "random"
): This model assumes that the individual entity (firm) effects are random and uncorrelated with the explanatory variables. In this model, the variation across entities is assumed to be random and is therefore modeled as a random effect that is unobserved. This allows for entity-specific variations in the intercept term.
The choice between Pooled OLS and a Random Effects model should be based on the nature of your data and your research question. If you have reason to believe that there are firm-specific effects that are unobserved but may be correlated with your explanatory variables, a Random Effects model may be more appropriate. If, on the other hand, you believe that there are no such firm-specific effects, a Pooled OLS model may suffice.
A statistical way to decide between these models would be to perform a Breusch-Pagan Lagrange Multiplier test. If the test is significant, this would suggest the presence of significant random effects, favoring the use of a Random Effects model over a Pooled OLS model. If the test is not significant, then a Pooled OLS model would be appropriate. You’ve mentioned earlier that your Breusch-Pagan LM test result was significant, which suggests a Random Effects model might be more suitable for your data. However, the insignificant Hausman test result suggested no systematic difference between Fixed Effects and Random Effects model. Therefore, the choice of the model should also consider the context of your study and theory behind your variables.
Question: What are the differences among Pooled Ordinary Least Squares (OLS), Random Effects model implemented with the plm function with model = “random” and fixed Effects model implemented with the plm function with model = “within”?
Answer The differences are as follows:
- Pooled OLS: This approach does not consider the panel data structure. It treats the data as if it’s a simple cross-section. That means it assumes each observation (each row of your data) is independent of the others. This could potentially lead to biased estimates if there are any unobserved factors that differ between entities (firms in your case) and are constant over time, because these are correlated with other variables.
- Random Effects (plm function with model = “random”): This model acknowledges that the data is a panel and that each entity could have its own characteristics. It assumes that these individual characteristics (effects) are not correlated with the explanatory variables, and they are captured as part of the error term. This model allows for variation across entities and also between time periods for each entity.
- Fixed Effects (plm function with model = “within”): Like the random effects model, the fixed effects model acknowledges that each entity could have its own characteristics. However, it assumes that these characteristics (effects) could be correlated with the explanatory variables. It allows for these individual characteristics by allowing each entity to have its own intercept. The fixed effects model controls for all time-invariant differences between the individuals, so the estimated coefficients of the fixed-effects models are only due to sources of variations that are within an individual.
The decision to use either Pooled OLS, Random Effects, or Fixed Effects depends on the structure of your data and the nature of your research question. Pooled OLS can be used if there’s no reason to suspect any correlation between unobserved individual characteristics and the independent variables. Random effects can be used if you believe there are individual-level effects, but these are uncorrelated with the independent variables. Fixed effects can be used if you believe there are individual-level effects, and these are correlated with the independent variables.
Statistical tests, such as the Breusch-Pagan Lagrange Multiplier (LM) test and the Hausman test, can help guide this decision. Breusch-Pagan LM test can be used to decide whether a random effects model or a simple OLS regression would be more appropriate. The Hausman test helps decide between a Fixed Effects Model vs a Random Effects Model. If the Hausman test is significant, then the individual-level effects are correlated with the independent variables, and a Fixed Effects model is more appropriate. If not, a Random Effects model is more appropriate.
Question: How does random effect model is related to GLS?
Answer The random effects model in the context of panel data analysis is related to generalized least squares (GLS) through the way it estimates parameters.
In a standard ordinary least squares (OLS) model, we assume that the error terms are homoskedastic (have constant variance) and are not correlated with each other. However, in panel data, these assumptions may be violated due to the presence of individual-specific effects and time-specific effects. In the presence of such effects, OLS estimates are no longer the most efficient, even though they are still unbiased.
To tackle this, the random effects model is estimated using a technique called feasible generalized least squares (FGLS), which is a form of GLS. The GLS approach allows for heteroskedasticity and correlation in the error terms, leading to more efficient parameter estimates compared to OLS when these issues are present.
In the context of a random effects model, we assume that the individual-specific effects are random and uncorrelated with the predictors. This effect is integrated into the error term, which results in an error term that varies across individuals (hence it’s heteroskedastic) and is serially correlated over time for a given individual. The FGLS method used in a random effects model takes into account this form of the error term and estimates the model parameters accordingly.
Question: How does REM GLS work?
Answer In panel data analysis, the Random Effects Model (REM) is a method that tries to handle the unobserved heterogeneity when this heterogeneity, or differences across entities (like firms, countries, etc.), is believed to be random and uncorrelated with the predictors or independent variables.
In Random Effects Model, the unobserved heterogeneity is assumed to be a random error term. This random error term and the remainder error term are assumed to be normally distributed. Here, both within-entity (i.e., time-series) correlation and between-entity (i.e., cross-sectional) variation are considered.
When we talk about Random Effects GLS (Generalized Least Squares), it’s a specific way of estimating the Random Effects Model. The standard Random Effects Model can be estimated using feasible GLS (FGLS).
In FGLS, we first transform the data to eliminate the entity-specific effects (for example, characteristics specific to each firm that do not change over time). The transformation is based on the assumption that these effects are random, and it tries to minimize the variance of the error term. After transforming the data, we estimate the model using Ordinary Least Squares (OLS) on the transformed data.
This way, GLS can handle panel data where the error terms are not identically distributed, but instead have some form of serial correlation within entities and/or heteroscedasticity across entities. This makes GLS a more flexible tool for panel data analysis, but at the same time, the assumptions it requires (such as knowledge about the form of serial correlation and heteroscedasticity) make it more complex to use in practice.
In R, as I previously showed, you can use the plm
package to estimate a random effects model, and it will automatically use a FGLS estimator. Here’s an example:
library(plm)
# pdata should be your panel data with FirmName and Year as indexes
pdata <- pdata.frame(yourdata, index = c("FirmName", "Year"))
# Estimate random effects model
model_re <- plm(ROA ~ FW + MW + IN, data = pdata, model = "random")
# Summary of the model
summary(model_re)
In the summary of the model, you’ll see coefficient estimates, standard errors, z-values, and p-values for the predictors. These can be interpreted similarly as in OLS regression, but keeping in mind that the estimates take into account the random entity-specific effects.
Question: what is first difference model? How is this different from Fixed Effect Model with lagged dependent variable?
Answer A first-difference model is a type of panel data model that attempts to eliminate potential issues related to unobserved variables that could be correlated with the predictors.
In a panel data set, you have observations on the same units (like individuals, firms, countries) over multiple time periods. There could be some unobserved, time-invariant characteristics of these units that affect the outcome variable and are correlated with the predictors. This could lead to omitted variable bias in a regular OLS regression.
The first-difference model is a solution to this problem. The idea is to take the difference between each pair of consecutive observations for each unit. By doing this, you eliminate all time-invariant characteristics (because these differences are always zero) and can obtain unbiased estimates of the predictors’ effects.
The first-difference model is usually written as:
ΔY_it = β * ΔX_it + Δu_it
where ΔY_it is the difference in the outcome variable for unit i between time t and t-1, ΔX_it is the corresponding difference in the predictor variable, and Δu_it is the difference in the error term.
It’s important to note that while this approach eliminates time-invariant unobserved variables, it does not deal with unobserved variables that change over time. Also, by differencing the data, you lose one time period’s worth of observations. Finally, the first-difference model assumes that the effect of the predictors is constant over time. If this is not the case, a fixed-effects model or a random-effects model might be more appropriate.
A Fixed Effects Model with a lagged dependent variable and a First Difference model are both methods to control for time-invariant unobservable effects in panel data, but they do so in different ways and under different assumptions.
- Fixed Effects Model with Lagged Dependent Variable: In this approach, a lagged version of the dependent variable (the value of the dependent variable in the previous time period) is included as an independent variable in the model. The model accounts for unobservable, time-invariant characteristics by using individual-specific dummy variables (or by subtracting the individual’s mean over time from each observation). The lagged dependent variable can help to account for serial correlation in the errors (when the error in one time period is correlated with the error in the previous time period). However, including a lagged dependent variable in a fixed effects model can introduce bias due to correlation between the lagged dependent variable and the error term. This is known as Nickell bias in the context of panel data.
- First Difference Model: In a first-difference model, the change in the dependent variable from one period to the next is modeled as a function of the changes in the independent variables. This removes any time-invariant individual effects because any characteristics that are constant over time will be eliminated in the differencing process. The first difference model can also help to reduce serial correlation in the errors, although it does not eliminate it entirely. Unlike the fixed effects model with a lagged dependent variable, the first difference model does not suffer from Nickell bias.
In summary, the main difference is how each method deals with time-invariant unobservable effects and serial correlation. The appropriate model to use depends on the specifics of your data and research question. For instance, if you believe that the effects of the independent variables could change over time, a first-difference model may be more appropriate. If you have reasons to include a lagged dependent variable and your panel is not too short (to mitigate the Nickell bias), you might choose the fixed effects model with a lagged dependent variable.
Question: Here is the output from Random Effect Model:
How to interpret the result?
Answer Your output from the random effects model provides several important pieces of information:
- Model Specification: The first part of the output describes the type of model you estimated, which in this case is a one-way (individual) effect random effect model using the Swamy-Arora’s transformation. This means the model assumes there are random, firm-specific effects on the dependent variable (ROA) that are not correlated with the independent variables.
- Panel Information: The unbalanced panel information tells you about the structure of your data. ‘n = 332’ means you have data from 332 firms, ‘T = 1-5’ means the time period covers 5 years, and ‘N = 1439’ indicates the total number of observations in the dataset.
- Variance Components: The ‘Effects’ part shows the decomposition of the total variance into idiosyncratic (or within-firm, time-variant) and individual (or between-firm, time-invariant) components. Both are contributing nearly equally to the total variability in your model.
- Theta: The ‘theta’ is a measure used in the transformation of the data in the GLS procedure. Its values can vary between firms, depending on the variances within and between firms.
- Residuals: The residuals summary provides statistics about the distribution of the model’s residuals (the differences between the observed and predicted values of ROA).
- Coefficients: This is arguably the most important part of the output, where the estimated coefficients (or effect sizes) for each predictor are given, along with their standard errors, z-values, and p-values. Each coefficient represents the expected change in ROA for a one-unit increase in the corresponding predictor, assuming all other predictors are held constant. All predictors are statistically significant in this model, as their p-values are less than 0.05.
- Family ownership (fw): A one-unit increase in fw is associated with an increase of 0.0069 units in ROA, holding all else constant.
- Institutional Ownership (in): A one-unit increase in in is associated with an increase of 0.0061 units in ROA, holding all else constant.
- Managerial Ownership (mw): A one-unit increase in mw is associated with an increase of 0.0061 units in ROA, holding all else constant.
- Model Fit: The R-squared value is a measure of how well the model fits the data. In this case, R-squared is 0.018925, indicating that only about 1.89% of the variation in ROA is explained by your predictors. This might suggest that there are other important variables not included in your model that influence firm performance.
From a practical perspective, you might conclude from this model that all types of ownership structures are positively associated with firm performance, as measured by ROA. However, the effects are relatively small, and a large proportion of the variability in firm performance is not explained by these ownership variables. Therefore, you might want to consider other factors that could be influencing firm performance in addition to ownership structure.
Question: I got the following output from Fixed Effect Model with lagged dependent variable:
How to interpret the result?
Answer The model estimated is a dynamic panel model or a lagged dependent variable model. This kind of model allows the dependent variable in the previous time period (lag(roa)) to affect the dependent variable in the current time period (roa).
Let’s break down the results:
- Lagged ROA (lag(roa)): The coefficient for lag(roa) is 0.7744631, and it’s highly significant (p < 0.001). This implies that ROA from the previous time period has a significant and positive impact on the current ROA. Specifically, for a one-unit increase in the previous period’s ROA, the current ROA is expected to increase by about 0.774 units, holding all else constant. This is not surprising as firm performance in one year can be highly dependent on the previous year’s performance.
In a practical sense, this could suggest that firms with high performance in the past are likely to continue performing well, and similarly, low-performing firms tend to continue with low performance. This could be due to various factors such as accumulated resources, reputation, or strategic decisions made based on previous years’ performance.
- Family Ownership (fw), Institutional Ownership (in), and Managerial Ownership (mw): All three of these variables have positive and significant relationships with the ROA, similar to the random effects model you previously estimated. The interpretation is the same: an increase in these ownership structures leads to a positive change in the firm’s performance, all else constant.
- Model Fit: The R-Squared and Adjusted R-Squared values are 0.568 and 0.380 respectively. This is substantially higher than your previous model, which suggests that by including the lagged dependent variable, the model explains a larger portion of the variation in ROA.
Overall, from this model, it seems that past performance and the ownership structures are key determinants of the current performance of a firm.
Things to further explore
#clustere standard errors
coeftest(random_effects, vcovHC(random_effects, type = “HC0”, cluster = “group”))
References:
www.youtube.com/watch?v=2igMNODFypk&t=125s
www.OpenAI.com