We will now deal with eigenvalues, eigenvectors, and eigenspaces of matrices. From the previous chapter, we know that matrices can be viewed as linear transformations. Eigenvalues, eigenvectors, and eigenspaces deal with special cases of transformations where the direction of the vector is preserved after the operation.
Eigenvalues and eigenvectors are important because they let us quickly find the steady state of a system, i.e. how the system looks in the long run. They are also invariant under changes of basis.
Fun fact: Google’s PageRank algorithm is based on eigenvalues and eigenvectors!
Quick overview of eigen-everything:
Definitions
An eigenvector of a linear transformation is a non-trivial vector that is only scaled when a linear transformation is applied to it. This means that the direction of the eigenvector is preserved after the operation.
In mathematical notation: Let v be an eigenvector of the matrix A. Then Av=λv for a unique λ∈R.
This equation says that the transformed vector Ax is just a scaled version of the original vector v.
We also define the eigenvector to be non-trivial because A0=0 for all matrices.
The eigenvalue associated with this eigenvector is the scalar quantity by which the eigenvector is scaled. In the above equation, λ would the eigenvalue associated with the vector v. The eigenvalue associated with an eigenvector can be 0.
The eigenspace of an eigenvalue is the span of all eigenvectors with this eigenvalue because any linear combination of the eigenvectors is still an eigenvector with the same eigenvalue. In the above equation the eigenspace Eλ=span{v} because there is one eigenvector v corresponding to λ.
Proof: Let v be an eigenvector of the matrix A. Then Av=λv for a unique λ. Now, let’s scale v by a constant c. A(cv)=c(Av)=c(λv)=λ(cv) Therefore, λ is still the associated eigenvalue of cv.
Calculation
Let us derive a method for solving for the eigenvalues, eigenvectors, and eigenspaces of a given matrix.
We know that for a given eigenvector v, Av=λv
However, we have 2 unknowns: v and λ, so we try to eliminate one of them.
We can re-write the equation as Av=λIv
where I is the identity matrix. We then subtract Av from both sides.
λIv−Av=0
Factoring out the v, we get
(λI−A)v=0
We observe that v is in the null space of the matrix λI−A. Since we know that any eigenvector v is non-trivial, we have to make sure that the null space λI−A is non-trivial as well.
Recall the Invertible Matrix Theorem, which states that if the null space of a matrix is non-trivial, its determinant must be 0. Therefore, to solve for the eigenvalues, we use the equation det(λI−A)=0
det(λI−A)=0 is called the the characteristic polynomial of A because it is a polynomial in λ. To find the eigenvalues of a matrix, we find the roots of the characteristic polynomial det(λI−A)=0.
Sometimes, an eigenvalue λi can appear multiple times as a root. We define the algebraic multiplicity of the eigenvalue λi to be the number of times λi appears as a root in the characteristic polynomial.
Notes:
Observe what happens if an eigenvalue of A is 0. Then, det(0I−A)=det(−A)=0⟹det(A)=0. Therefore, a matrix A has an eigenvalue of 0 if and only if A is non-invertible.
If A is a triangular matrix, we can easily find the eigenvalues of A. Assume that d1,d2,…,dn are the entries along the diagonal in such a matrix. From cofactor expansion, we know that det(A)=d1d2…dn. This means that det(λI−A)=(λ−d1)(λ−d2)⋯(λ−dn)=0, so d1,d2,…,dn are the roots of the characteristic polynomial, i.e. the eigenvalues. The eigenvalues of a triangular matrix are its diagonal entries.
Once we have all of the eigenvalues λ1,λ2,…,λn of a A, we can find the eigenvectors associated with each eigenvalue by using the equation (λiI−A)v=0. In fact, we are basically finding the basis vectors for the null space of λiI−A for i=1,2,…,n.
If the dimension of the null space of λiI−A for an eigenvalue λi is larger than 1, we will expect to have multiple eigenvectors associated with the same eigenvalue λi.
We therefore define the geometric multiplicity of an eigenvalue to be the dimension of the associated null space of λiI−A.
Example
Let us try to find the eigenvalues, eigenvectors, and eigenspaces of the matrix A=100−10−1223.
We first find the roots of the characteristic polynomial.
Therefore, the eigenvectors associated with λ1=1 are 100 and 121. Since there are 2 eigenvectors associated with λ1=1, we say that the eigenvalue λ1=1 has a geometric multiplicity of 2. The eigenspace corresponding to λ1=1 is E1=span⎩⎨⎧100,121⎭⎬⎫.
111 is the eigenvector associated with λ2=2, which has a geometric multiplicity of 2. E2=span⎩⎨⎧111⎭⎬⎫.
Diagonalization
What and why
Imagine a matrix A represented a state transition matrix, and we want to find out the steady state. We would have to calculate limk→∞Ak, but matrix multiplication is a very tedious process. If we could express A as a product of 3 matrices, PDP−1, where P is an invertible matrix and D is a diagonal matrix, then we could easily find the limit. Assume A=PDP−1, that is A is diagonalizable.
Calculating Dk is very easy; in fact, for any diagonal matrix D=d10⋮00d2⋮0⋯⋯⋱000⋮dn, Dk=d1k0⋮00d2k⋮0⋯⋯⋱000⋮dnk, so we just need to find the k-th power of each diagonal entry in D.
Calculation
Let A∈Rn×n, D=d10⋮00d2⋮0⋯⋯⋱000⋮dn, and P=[p1p2⋯pn].
We recognize these equations as the definitions of eigenvectors and eigenvalues. Therefore, the diagonal matrix D is just a matrix with the eigenvalues of A along its diagonal and the invertible P is just a matrix with the eigenvectors of P as column vectors.
If (λ1,v1),(λ2,v2),…,(λn,vn) are the eigenpairs of the matrix A, then D=λ10⋮00λ2⋮0⋯⋯⋱⋯00⋮λn and P=[v1v2⋯vn].
Note: Order matters! The eigenvector vi in the i-th column of P must correspond to the eigenvalue λi in the i-th column of D.
We would then need to find P−1 by finding the inverse of P.
Let's now see what happens if we want to calculate Ak. For this, we will use the row vectors of P−1, which we will define as P−1=q1Tq2T⋮qnT. We know that
We could express Ak as a sum of matrices piqiT scaled by the respective eigenvalue to the k-th power λik. If λi<1, then limk→∞λik=0, so we could basically eliminate the matrix associated with λi<1 from the above equation because its scaling factor λik approaches 0.
Example
Taking our previous example, we have the following eigenvalues and eigenvectors.
λ1=1:100,121
λ2=2:111
We can therefore diagonalize 100−10−1223 as PDP−1, where D=100010002 and P=100121111. Finally, we have to calculate P−1.
Also note that there are multiple ways to diagonalize the above matrix. We could have just as well diagonalized it where D=100020001 and P=121111100. Just make such that the eigenvector in the first column of P is associated with the eigenvalue in the first column of D, that the eigenvector in the second column of P is associated with the eigenvalue in the second column of D, etc.
Diagonalizability
We note that a matrix A∈Rn×n is only diagonalize if it is square and has exactly n eigenvectors. This leads to two conclusions about the diagonalizability of a matrix A:
If A has nunique eigenvalues, then it must have n eigenvectors, so it is automatically diagonalizable.
If the sum of the geometric multiplicities of all eigenvalues of A equals n, i.e. A has n eigenvectors, then A is diagonalizable.
Note: Even if the sum of the algebraic multiplicities equals n, this does not mean the A has n eigenvectors. Take A=[1011] for example. The characteristic equation is (λ−1)2=0, so λ1=1 has an algebraic multiplicity of 2. However, Null([0010])=span{[10]}, so the geometric multiplicity of λ1=1 is only 1. In general, number of columns in A≥sum of algebraic multiplicities≥sum of geometric multiplicities