Applied Econometrics Dummy Variables
1
Applied Econometrics
Lecture 4: Use of Dummy Variables ‘Pure and complete sorrow is as impossible as pure and complete joy’ 1) Introduction
The quantitative independent variables used in regression equations, which usually take values over
some continuous range. Frequently, one may wish to include the quality independent variables, often
called dummy variables, in the regression model in order to (i) capture the presence or absence of a
‘quality’, such as male or female, poor or rich, urban or rural areas, college degree or do not college
degree, different stages of development, different period of time; (ii) to capture the interaction
between them; and, (iii) or to take on one or more distinct values.
2) Intercept Dummy
An intercept dummy is a variable, says D, has the value of either 0 or 1. It is normally used as a
regressor in the model.
For example, the consumption function (C) can be written as follows:
C = b
0
+ b
1
Y + b
2
D
1
Y
C = (b
0
+ b
2
)+ b
1
Y
Y
C Illustrative example 1 (Maddala, 308)
We suppose that we regress the consumption (C) on income (Y) for household. We include the
following quality variables in the form of dummy variables
Written by Nguyen Hoang Bao May 22, 2004
Applied Econometrics Dummy Variables
2
⎩
⎨
⎧
=
femaleisgenderif0
maleisgenderif1
D
1
4⎩
⎨
⎧
<≤
=
otherwise0
degreecollegeeducationdegreeschoolhighif1
D
5Then we run the following regression equation
C = α + βY + γ
1
D
1
+ γ
2
D
2
+ γ
3
D
3
+ γ
4
D
2
, and D
3
are seasonal dummies defined by:
⎩
⎨
⎧
=
othersfor0
quarterfirstthefor1
D
1⎩
⎨
⎧
=
othersfor0
quartersecondthefor1
D
2Written by Nguyen Hoang Bao May 22, 2004
Applied Econometrics Dummy Variables
3
⎩
⎨
1
+ b
2
)Y
C = b
0
+ (b
1
+ b
2
)Y
C = b
0
+ b
1
Y
Y
C
4) Combination of Slope and Intercept Dummies
We may include both slope and intercept dummies in a regression model
DY = D x Y
D is equal to 1 for developing countries and 0 for
developed countries
The general model can be written as follows:
Y = b
0
2
C = (b
0
+ b
2
) +(b
1
+ b
3
)Y
C = b
0
+ b
1
Y
Y
C
5) Piece – Linear Regression Model
Most of the econometric models we have studied have been continuous, with small changes in one
variable having a measurable effect on another variable.
If we want to explain investment (I) as a function of interest rate (r), the two segments of the
piecewise linear regression show in the below figure. Written by Nguyen Hoang Bao May 22, 2004
Applied Econometrics Dummy Variables
4
2
)r
where r
*
is obtained when we plot the dependent
variable against the explanatory variables and
observing if there seem to be a sharp change in
the relation after a given value of r
*
.
I
r
r
*6) Summary
If a qualitative variable has m categories, we include (m – 1) dummy variables in the model. The
coefficients attached to the dummy variables must always be interpreted in the relation to the base
variable, that is, the group that gets the value zero.
The use of dummy variables associated with two or more categorical variables allows us to study
partial association and interaction effects in the context of multiple regression. Interactive dummies
are obtained by multiplying dummies corresponding to the different categorical variables. This
allows us to test formally whether interaction is present or not. References
actual salary is much lower or higher, it can be reviewed to see whether it is appropriate.
Fred Kopp, for example, is a 32 – year old vice president of a large restaurant chain. He
has been with the firm since he obtained a 2 – year MBA at age 25, following a 4 – year
degree in economics. He now earns $126,000 annually.
1.1.1) What is Fred’s fitted salary?
1.1.2) How many standard deviations is his actual salary away from his fitted salary?
Would you therefore call his salary exceptional?
1.1.3) Closer inspection of Fred’s record showed that he had spent two years studying
at Oxford as a Rhodes Scholar before obtaining his MBA. In light of this
information, recalculate your answers to 5.1.1) and 5.1.2)
1.2) In addition to identifying unusual salaries in specific firms, the regression can be used to
answer questions about the economy – wide structure of executive salaries in all firms.
For example,
1.2.1) Is there evidence of sex discrimination?
1.2.2) Is it fair to say that each year’s education (beyond high school) increases the
income of the average executive by $3,600 a year? Written by Nguyen Hoang Bao May 22, 2004
Applied Econometrics Dummy Variables
6
2) In an environment study of 1072 men, a multiple regression was calculated to show how lung
function was related to several factors, including some hazardous occupations (Lefcoe and
Wonnacott, 1974):
AIRCAP = 4500 – 39 AGE – 9.0 SMOK – 350 CHEMW – 380 FARMW – 180 FIREW
(SE) (1.8) (2.2) (46) (53) (54)
where
7
3) In an observation study to determine the effect of a drug on blood pressure it was noticed that
the treated group (taking the drug) tended to weigh more than the control group. Thus, when
treated group had higher blood pressure on average, was it because of the treatment or their
weight? To untangle this knot, some regressions were computed, using the following variables:
BP = blood pressure
WEIGHT = weight
D = 1 if taking the drug, 0 otherwise
The data set is given by:
D WEIGHT BP
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
180
150
210
a) For someone of the same weight who is on the drug?
b) For someone on the same treatment who is 10 lbs. heavier?
3.2) How would the simple regression coefficient compare to the multiple regression
coefficient for weight? Why?
Written by Nguyen Hoang Bao May 22, 2004
Applied Econometrics Dummy Variables
8
4) Use data file SRINA
4.1) Regress Ip on Ig
4.2) Repeat the regression using (i) an intercept dummy; (ii) a slope dummy; and, (iii) both
slope and intercept dummies. Select the break point by looking at the scatter plot Ip against
Ig
4.3) Draw scatter plot and fitted line on each regression
4.4) Comment on your results
5) Use data file LEACCESS
5.1) Regress LE on Y
5.2) Repeat the regression using (i) an intercept dummy; (ii) a slope dummy; and, (iii) both
slope and intercept dummies. Use t test check whether they are significant or not. Select
the break point by looking at the scatter plot LE against Y.
8) Use data file INDIA
8.1) Does your conclusion confirm that gender matter in terms of explaining earning
differences?
8.2) Does your conclusion confirm that educational level in terms of explaining earning
differences?
8.3) Regress ln(WI) on gender, education, and age using the appropriate dummy variables? Written by Nguyen Hoang Bao May 22, 2004