CHAPTER 9
Random Matrices
The step from random vectors to random matrices (and higher order random
arrays) is not as big as the step from individual random variables to random vectors.
We will first give a few quite trivial verifications that the expected value operator
is indeed a linear operator, and them make some not quite as trivial observations
about the expected values and higher moments of quadratic forms.
9.1. Linearity of Expected Values
Definition 9.1.1. Let Z be a random matrix with elements z
ij
. Then
E
[Z] is
the matrix with elements E[z
ij
].
245
246 9. RANDOM MATRICES
Theorem 9.1.2. If A, B, and C are constant matrices, then
E
[AZB + C] =
A
E
[Z]B + C.
Proof by multiplying out.
Theorem 9.1.3.
E
[Z
] = (
E
E
[Y ], and
E
[a · X + b · Y ] = a ·
E
[X] + b ·
E
[Y ].
If X and Y are random matrices, then the covariance of these two matrices is
a four-way array containing the covariances of all elements of X with all elements
of Y . Certain conventions are necessary to arrange this four-way array in a two-
dimensional scheme that can be written on a sheet of paper. Before we develop
those, we will first define the covariance matrix for two random vectors.
Definition 9.1.5. The covariance matrix of two random vectors is defined as
(9.1.1)
C
[x, y] =
E
[(x −
E
[x])(y −
E
[y])
].
Theorem 9.1.6.
C
[x, y] =
E
[xy
C
[x, u]
C
[x, v]
C
[y, u]
C
[y, v]
.
Special case:
C
[Ax+By, Cu+Dv] = A
C
[x, u]C
+A
C
[x, v]D
+B
C
[y, u]C
+
B
C
[y, v]D
1
] var[x
2
] ··· cov[x
2
, x
n
]
.
.
.
.
.
.
.
.
.
.
.
.
cov[x
n
, x
1
] cov[x
n
, x
2
] ··· var[x
n
x
is almost surely a constant.
Proof: Call
V
[x] = Σ
Σ
Σ. Then Σ
Σ
Σ singular iff an a exists with Σ
Σ
Σa = o iff an a exists
with a
Σ
Σ
Σa = var[a
x] = 0 iff an a exists so that a
x is almost surely a constant.
This means, singular random variables have a restricted range, their values are
contained in a linear subspace. This has relevance for estimators involving singular
random variables: two such estimators (i.e., functions of a singular random variable)
should still be considered the same if their values coincide in that subspace in which
the values of the random variable is concentrated—even if elsewhere their values
differ.
Problem 154. [Seb77, exercise 1a–3 on p. 13] Let x = [x
1
, . . . , x
n
1 0 0 0 0
−1 1 0 0 0
0 −1 1 0 0
0 0 −1 1 0
0 0 0 −1 1
A
−1
=
1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
1 1 1 1 0
1 1 1 1 1
and A
−1
(A
−1
Then
(9.2.1) E[y
Ay] = σ
2
tr(AΨ) + η
Aη.
250 9. RANDOM MATRICES
Proof. Write y as the sum of η and ε
ε
ε = y − η; then
y
Ay = (ε
ε
ε + η)
A(ε
ε
ε + η)(9.2.2)
= ε
ε
ε
Aε
ε
ε + ε
ε
ε
Aε
ε
ε)] = E[tr(Aε
ε
εε
ε
ε
)] = tr(A
E
[ε
ε
εε
ε
ε
])(9.2.5)
= σ
2
tr(AΨ).(9.2.6)
Here we used that tr(AB) = tr(BA) and, if c is a scalar, i.e., a 1 ×1 matrix, then
tr(c) = c.
In tile notation (see Appendix B), the proof of theorem 9.2.1 is much more
straightforward and no longer seems to rely on “tricks.” From y ∼ (η,Σ
Σ
Σ), i.e., we
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 251
are writing now σ
2
Ψ = Σ
E
y
A
y
=
η
A
η
+
Σ
Σ
Σ A
.
Problem 155. [Seb77, Exercise 1b–2 on p. 16] If y
1
, y
2
, . . . , y
n
are mutually in-
dependent random variables with commom mean η, and with variances σ
2
1
, σ
2
2
, . . . , σ
2
1
σ
2
2
. . . σ
2
n
). Then the
vector
y
1
− ¯y y
2
− ¯y . . . y
n
− ¯y
can be written as (I −
1
n
ιι
)y.
1
n
ιι
+ ··· + σ
2
n
) −
1
n
tr[ι
Σ
Σ
Σι](9.2.11)
=
n − 1
n
(σ
2
1
+ ··· + σ
2
n
).(9.2.12)
Divide this by n(n −1) to get (σ
2
1
+ ··· + σ
2
n
)/n
2
, which is var[¯y], as claimed.
(9.2.13) γ
2
1
≤ γ
2
+ 2.
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 253
Answer. Define ε = y − µ, and apply Cauchy-Schwartz for the variables ε and ε
2
:
(9.2.14) (σ
3
γ
1
)
2
= (E[ε
3
])
2
=
cov[ε, ε
2
]
2
≤ var[ε] var[ε
2
] = σ
0 with probability 1/(γ
2
+ 3 −γ
2
1
),
−b with probability 1/2br
This variable has expected value zero, variance 1, its third moment is γ
1
, and its fourth moment
γ
2
+ 3.
Theorem 9.2.2. Given a random vector ε
ε
ε of independent variables ε
i
with zero
expected value E[ε
i
] = 0, and whose second and third moments are identical. Call
var[ε
i
] = σ
2
, and E[ε
3
i
] = σ
ε
ε)(ε
ε
ε
C ε
ε
ε)] = σ
3
γ
1
a
c.
Proof. If i = j = k = i, then E[ε
i
ε
j
ε
k
] = 0 · 0 · 0 = 0; if i = j = k then
E[ε
i
ε
j
ε
k
] = σ
2
· 0 = 0, same for i = j = k and j = i = k. Therefore only E[ε
ε
ε
εε
ε
ε
C ε
ε
ε] = σ
3
γ
1
i
a
i
c
ii
= σ
3
γ
1
a
c.(9.2.19)
One would like to have a matrix notation for (9.2.16) from which (9.2.17) follows by
a trivial operation. This is not easily possible in the usual notation, but it is possible
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 255
in tile notation:
E
= γ
1
σ
3
a
∆
C
(9.2.21)
256 9. RANDOM MATRICES
Since n ∆ C is the vector of diagonal elements of C, called c, the last term
in equation (9.2.21) is the scalar product a
c.
Given a random vector ε
ε
ε of independent variables ε
i
with zero expected value
E[ε
i
] = 0 and identical second and fourth moments. Call var[ε
i
] = σ
2
and E[ε
4
i
] =
σ
4
if i = j = k = l or i = k = j = l
or i = l = j = k
0 otherwise.
It is not an accident that (9.2.22) is given element by element and not in matrix
notation. It is not possible to do this, not even with the Kronecker product. But it
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 257
is easy in tile notation:
(9.2.23)
E
ε
ε
ε ε
ε
ε
ε
ε
ε ε
ε
ε
= σ
4
+ σ
4
+ σ
4
+ γ
2
σ
A
ε
ε
ε ε
ε
ε
ε
ε
ε ε
ε
ε
B
= σ
4
A
B
+ σ
4
A
B
+ σ
4
A
B
+ γ
2
σ
4
ε
ε] = σ
4
γ
2
c
d + 2σ
4
tr(CD).
Answer. Use cov[ε
ε
ε
Cε
ε
ε, ε
ε
ε
Dε
ε
ε] = E[(ε
ε
ε
Cε
ε
ε)(ε
ε
2
θ
A
2
θ + 4σ
3
γ
1
θ
Aa + σ
4
γ
2
a
a + 2 tr(A
2
)
.
Answer. Proof: var[x
Ax] = E[(x
Ax)
2
] − (E[x
+ 4(θ
Aε
ε
ε)
2
+ (θ
Aθ )
2
(9.2.28)
+ 4ε
ε
ε
Aε
ε
ε θ
Aε
ε
ε + 2ε
ε
ε
Aε
ε
ε θ
Aθ + 4 θ
To deal with the second term in (9.2.29) define b = Aθ; then
(θ
Aε
ε
ε)
2
= (b
ε
ε
ε)
2
= b
ε
ε
εε
ε
ε
b = tr(b
ε
ε
εε
ε
ε
b) = tr(ε
ε
ε
ε
Aε
ε
ε θ
Aε
ε
ε = ε
ε
ε
Aε
ε
ε b
ε
ε
ε(9.2.33)
E[ε
ε
ε
Aε
ε
ε θ
Aε
2π
e
−
z
2
2
.
To verify that this is a density function we have to check two conditions. (1) It is
everywhere nonnegative. (2) Its integral from −∞to ∞ is 1. In order to evaluate this
integral, it is easier to work with the independent product of two standard normal
variables x and y; their joint density function is f
x,y
(x, y) =
1
2π
e
−
x
2
+y
2
2
. In order to
261
262 10. MULTIVARIATE NORMAL
see that this joint density integrates to 1, go over to polar coordinates x = r cos φ,
y = r sin φ, i.e., compute the joint distribution of r and φ from that of x and y: the
absolute value of the Jacobian determinant is r, i.e., dx dy = r dr dφ, therefore
(10.1.2)
2
/2, therefore dt = r dr, the inner integral becomes −
1
2π
e
−t
∞
0
=
1
2π
; therefore the whole integral is 1. Therefore the product of the integrals of the
marginal densities is 1, and since each such marginal integral is positive and they are
equal, each of the marginal integrals is 1 too.
Problem 161. 6 points The Gamma function can be defined as Γ(r) =
∞
0
x
r−1
e
−x
dx.
Show that Γ(
1
2
) =
√
2
/2
dz =
1
√
2
∞
−∞
e
−z
2
/2
dz =
√
2π
√
2
=
√
π.
10.1. MORE ABOUT THE UNIVARIATE CASE 263
A univariate normal variable with mean µ and variance σ
2
is a variable x whose
standardized version z =
x−µ
σ
∼ N(0, 1). In this transformation from x to z, the
2
/2σ
2
.
Problem 162. 3 points Given n independent observations of a Normally dis-
tributed variable y ∼ N(µ, 1). Show that the sample mean ¯y is a sufficient statis-
tic for µ. Here is a formulation of the factorization theorem for sufficient statis-
tics, which you will need for this question: Given a family of probability densities
f
y
(y
1
, . . . , y
n
; θ) defined on R
n
, which depend on a parameter θ ∈ Θ. The statistic
T : R
n
→ R, y
1
, . . . , y
n
→ T(y
1
, . . . , y
n
) is sufficient for parameter θ if and only if
there exists a function of two variables g : R ×Θ → R, t, θ → g(t; θ), and a function
Answer. The joint density function can be written (factorization indicated by ·):
(10.1.6)
(2π)
−n/2
exp
−
1
2
n
i=1
(y
i
−µ)
2
= (2π)
−n/2
exp
−
1
2
n
i=1
(y
i
−¯y)
Here is a recursive definition from which one gets all multivariate normal distri-
butions:
(1) The univariate standard normal z, considered as a vector with one compo-
nent, is multivariate normal.
(2) If x and y are multivariate normal and they are independent, then u =
x
y
is multivariate normal.
(3) If y is multivariate normal, and A a matrix of constants (which need not
be square and is allowed to be singular), and b a vector of constants, then Ay + b
10.3. BIVARIATE NORMAL 265
is multivariate normal. In words: A vector consisting of linear combinations of the
same set of multivariate normal variables is again multivariate normal.
For simplicity we will go over now to the bivariate Normal distribution.
10.3. Special Case: Bivariate Normal
The following two simple rules allow to obtain all bivariate Normal random
variables:
(1) If x and y are independent and each of them has a (univariate) normal
distribution with mean 0 and the same variance σ
2
, then they are bivariate normal.
(They would be bivariate normal even if their variances were different and their
means not zero, but for the calculations below we will use only this special case, which
together with principle (2) is sufficient to get all bivariate normal distributions.)
(2) If x =
x
y
therefore we can apply the transformation theorem for densities. Let us first write
down the density function of x which we know:
(10.3.1) f
x,y
(x, y) =
1
2πσ
2
exp
−
1
2σ
2
(x
2
+ y
2
)
.
For the next step, remember that we have to express the old variable in terms
of the new one: x = P
−1
(u − µ). The Jacobian determinant is therefore J =
det(P
−1
). Also notice that, after the substitution
x
=
−
1
2σ
2
u − µ
v −ν
P
−1
P
−1
u − µ
v −ν
. Therefore the transformation theorem of density
10.3. BIVARIATE NORMAL 267
functions gives
(10.3.2) f
u,v
(u, v) =
1
2πσ
2
[
u
v
] = σ
2
P P
= σ
2
Ψ, say. Since P
−1
P
−1
P P
= I,
it follows P
−1
P
−1
= Ψ
−1
and
det(P
u − µ
v −ν
.
This is the general formula for the density function of a bivariate normal with non-
singular covariance matrix σ
2
Ψ and mean vector µ. One can also use the following
notation which is valid for the multivariate Normal variable with n dimensions, with
mean vector µ and nonsingular covariance matrix σ
2
Ψ:
(10.3.4) f
x
(x) = (2πσ
2
)
−n/2
(det Ψ)
−1/2
exp
−
1
2σ
2
(x − µ)
Ψ
ε
ε is σ
2
times the
n × n unit matrix, i.e.,
(10.3.5)
V
[Tε
ε
ε] = σ
2
I
n
.
Show that in this case the density function of y is
(10.3.6) f
y
(y) = (2πσ
2
)
−n/2
|det(T )|exp
−
1
2σ
2
T (y −α)
11
(y
1
− α
1
) + t
12
(y
2
− α
2
) + ··· + t
1n
(y
n
− α
n
)(10.3.7)
.
.
.(10.3.8)
z
n
= t
n1
(y
1
− α
1
) + t
] in terms of the standard
deviations σ
u
and σ
v
and the correlation coefficient ρ.
• b. 1 point Show that the inverse of a 2 × 2 matrix has the following form:
(10.3.10)
a b
c d
−1
=
1
ad − bc
d −b
−c a
.
• c. 2 points Show that
q
2
=
u − µ v − ν
Ψ
−1