FOREIGN TRADE UNIVERSITY
FACULTY OF INTERNATIONAL ECONOMICS
-----------------o0o-----------------
ECONOMETRIC REPORT
Topic: Factors that determine housing prices
Class
: KTEE 309.1
Group No.
: 7
Student Name – ID : Nguyen Ha Trang
- 1711150066 – 40%
Nguyen Mai Thuy Tien - 1711150064 – 30%
Nguyen Thi Lan Huong - 1715150032 – 30%
Supervisor
: Dr. Dinh Thanh Binh
Hanoi, 2018
Group 7
Econometric Report
Table of Contents
II. Introduction............................................................................................................ 3
III. Literature overview.............................................................................................. 3
1. Questions of interest...........................................................................................................3
2. Procedure and program used..............................................................................................3
Exhibit 4: Scatterplot of variables in the Housing Price model....................................................... 7
Exhibit 5: Regression model............................................................................................................... 8
Exhibit 6: Multicollinearity test.......................................................................................................... 9
Exhibit 7: Heteroskedasticity test..................................................................................................... 10
Exhibit 8: Residual-versus-fitted plot of the Housing Price model................................................ 11
Exhibit 9: Correcting heteroskedasticity......................................................................................... 11
Exhibit 10: Hypothesis testing of multiple regression model of neighborhood factors................12
Exhibit 11: Hypothesis testing of multiple regression model of accessibility factors....................13
2
Group 7
Econometric Report
I. Introduction
As much as Economy is a meaningful science that determines the social development in
general and national growth in particular, Econometrics is the use of statistical techniques
to understand those issues and test theories. Without evidence, economic theories are
abstract and might have no bearing on reality (even if they are completely rigorous).
Econometrics is a set of tools we can use to confront theory with real-world data.
Given the data set, our group, which includes three members: Nguyen Ha Trang, Nguyen Mai
Thuy Tien, and Nguyen Thi Lan Huong, follows the methodology of econometric comprising
eight steps to analyze the data. Note that because of the lack of information on the data set, all
inferences of abbreviations and others are based on assumptions and self-research. As a
result, we hope to have shown clearly our logic and reasoning of analysis.
To the extent of purpose and resources, there are still deficiencies in this report, but we look
forward to providing readers with a decent view of the overall of the data set given and the
knowledge that we have gained through Dr. Dinh Thanh Binh’s Econometrics course.
III. Economic model
As data are provided up front, the economic model used in this report is an empirical one.
Note that the fundamental model is mathematical; with an empirical model, however, data is
gathered for the variables and using accepted statistical techniques, the data are used to
provide estimates of the model's values.
Empirical model discovery and theory evaluation are suggested to involve five key steps, but
for the limitation of purpose and resources, this part of the report only follows three of them:
(1) specifying the object for modeling, (2) defining the target for modeling, (3) embedding
that target in a general unrestricted model.
1. Specifying the object for modeling
price f x
(1)
As such, this report finds the relationship between housing price, which is the object for
modeling, and each of relating factors including structure, neighborhood, accessibility, and air
pollution ones.
2. Defining the target for modeling by the choice of the variables to analyze, denoted
x i
As mentioned above, there are four main categories that are expected to affect housing prices:
structure, neighborhood, accessibility, and air pollution. Hence, the choices of xi would be
such variables that constitute them. After thorough research, factors have been narrowed
down to eight significant ones: (structure) number of rooms, (neighborhood) crimes, property
tax, the percentage of people of low status, student-teacher ratio, (accessibility) distances to
employment centers, accessibility to radial highways and (air pollution) nitrous oxide.
3. Embedding that target in a general unrestricted model (GUM)
In its simplest acceptable representation (which will later be specified in the econometric
Group 7
Econometric Report
IV. Econometric model
To demonstrate the relationship between housing price and other factors, the regression
function can be constructed as follows:
(PRF):
i
lprice o crime nox rooms dist radial 6 proptax stratio lowstat
1
2
3
4
5
7
2
3
4
5
Collinearity, by D.A. Belsey, E. Kuh, and R. Welsch, 1990. New York: Wiley
The structure of Economic data: cross-sectional data
2. Data description
To get statistic indicators of the variables, in Stata, the following command is used:
sum lprice crime nox rooms dist radial proptax stratio lowstat
The result is shown in Exhibit 2.
Exhibit 2: Statistic indicators of variables in the Housing Price model
Variable
Obs
Mean
lprice
506
9.941057
crime
nox
rooms
dist
506
506
506
506
2.106137
.006
3.85
3.56
1.13
88.976
8.71
8.78
12.13
9.549407
8.707259
1
24
40.82372
18.45929
12.70148
16.85371
2.16582
7.238066
18.7
12.6
Exhibit 3: Correlation matrix
lprice
crime
lprice
nox
rooms
dist
radial
proptax
stratio lowstat
1.0000
0.2054
-0.2098
-0.2921
-0.3540
-0.6096
1.0000
-0.4951
-0.5344
-0.2293
-0.5597
-0.4976
-0.7914
1.0000
0.4212
-0.2188
-0.3799
0.6254
0.5828
0.2887
0.4470
1.0000
-0.3028
-0.7702
0.6103
0.6670
0.1869
0.5856
From the matrix, it can be inferred that the correlation between lprice and each of the
independent variable is decent enough to run the regression model. Specifically:
- lprice and crime have a moderate downhill relationship
- lprice and nox have a moderate downhill relationship
- lprice and nox have a moderate uphill relationship
- lprice and dist have a weak uphill relationship
- lprice and radial have a moderate downhill relationship
- lprice and proptax have a moderate downhill relationship
- lprice and proptax have a moderate downhill relationship
SS
Model
Residual
64.8618936
19.7203314
Total
lprice
df
8
497
MS
P>|t|
Number of obs
F(
8,
497)
Prob > F
R-squared
Adj R-squared
Root MSE
[95% Conf.
0.000
-.0138573
-.0085078
nox
rooms
dist
radial
proptax
stratio
lowstat
_cons
-.0754564
.0996545
-.0463708
.0133694
-.0062133
-.0413327
-.0280384
11.19507
.0146936
.0167697
.0067557
.0026525
.0013807
.0050633
-.0465873
.1326028
-.0330975
.0185808
-.0035006
-.0313846
-.0242752
11.59535
From the result, it can be inferred that
crime, nox, rooms, dist, radial, proptax, stratio and lowstat all have statistically significant
effects on lprice at the 5% significant level (as all p-values are smaller than 0.05). In
particular, those effects can be specified by the regression coefficients as follows:
-
0
11.1951
: When all the independent variables are zero, the expected value of housing
price is 1011.1951 .
- 0.0112
: When the number of crime committed per capita increases by one, the
1
expected value of housing price decreases by 1.12%.
- 2 0.0755
: When the student-teacher ratio increases by 1%, the expected value of
housing price decreases by 4.13%.
- 0.028
8
: When the percentage of people of lower status increases by 1%, the
expected value of housing price decreases by 2.80%.
The coefficient of determination R squared 0.7669 : all independent variables (crime,
nox, rooms, dist, radial, proptax, stratio, lowstat) jointly explain 76.69% of the variation
in the dependent variable (lprice); other factors that are not mentioned explain the
remaining 23.31% of the variation in the lprice.
Other indicators:
Adjusted coefficient of determination adj R-squared = 0.7631
Total Sum of Squares TSS = 84.5822
Explained Sum of Squares ESS = 64.8619
Residual Sum of Squares RSS = 19.7203
-
-
-
-
The degree of freedom of Model Dfm= 8
The degree of freedom of residual Dfr = 497
Based on the data collected from the table, the sample regression function is established:
crime
stratio
6.79
3.69
2.58
2.45
1.77
1.74
1.53
0.147301
0.271206
0.388106
0.408804
0.565985
0.574531
0.653369
Mean VIF
3.43
9
Group 7
Econometric Report
df
p
Heteroskedasticity
235.31
44
0.0000
Skewness
Kurtosis
34.20
12.38
8
1
0.0000
0.0004
Total
281.89
53
F(
Prob > F
R-squared
Root MSE
Number of obs =
8,
497) =
=
=
=
506
179.01
0.0000
0.7669
.1992
Robust
lprice
Coef.
Std. Err.
t
crime
-.0111825
.025796
.0068001
.0029003
.0013641
.0042322
.003584
.2672806
-5.01
3.86
-6.82
4.61
-4.55
-9.77
-7.82
41.89
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
-.1050506
.0489718
-.0597312
.0076711
1. The impact of neighborhood factors
The question of interest: In the multiple regression model:
lprice o crime nox rooms dist radial 6 proptax stratio lowstat
1
2
3
4
5
7
(full model)
8
Does the subset of independent variables (crime, proptax, lowstat, stratio) contribute to
explaining/ predicting lprice? Or, would it do just as well if these variables were dropped and we reduced
the model to
lprice o nox rooms dist radial
2
3
4
In Stata, the test statistic F is calculated using the command:
test crime proptax lowstat stratio
The result is shown in Exhibit 10.
Exhibit 10: Hypothesis testing of multiple regression model of neighborhood factors
(
(
(
(
1)
2)
3)
4)
crime =
proptax
lowstat
stratio
F(
4,
0
= 0
= 0
= 0
497) =
Prob > F =
7
(full model)
8
Does the subset of independent variables (dist, radial) contribute to explaining/ predicting
lprice? Or, would it do just as well if these variables were dropped and we reduced the
model to
lprice o crime nox rooms 6 proptax stratio lowstat
1
2
3
7
(reduced model).
8
From this question, the following hypothesis is postulated:
Null Hypothesis:
The initial assumption is that the subset does not contribute
to the model's explanatory power
Alternative Hypothesis: At least one of the independent variables in the subset is
useful in explaining/predicting lprice
which is expressed as:
0.0000
As a result, there is enough evidence to reject the null hypothesis and conclude that at least
one independent variable in the subset (dist, radial) does have explanatory or predictive
power on lprice, so we don’t reduce the model by dropping out this subset.
13
Group 7
Econometric Report
IX. Result analysis & Policy implication
From data analysis in preceding sections, we have gained an overall view of the data set
given in terms of the statistical proof of the relationship between housing prices and each of
the factors proposed. As mentioned at the beginning of this report, we aim to learn how
structure, neighborhood, accessibility, and air pollution features are associated with housing
price. In other words, we are concerned about what is the willingness of buyers to pay for
these components.
Following the analysis of data, regression model run and hypothesis testing, it can be
concluded that structure, neighborhood, accessibility, and air pollution factors do affect, or at
least statistically so, the housing prices. Therefore, tenants, investors or constructors should
take all of these ingredients into account when making deals.
14
Group 7
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.926.5532&rep=rep1&type=pdf
D.A. Belsey, E. Kuh, and R. Welsch, Regression Diagnostics: Identifying Influential Data
and Sources of Collinearity, New York: Wiley (1990).
16