Eigen-everything

What and why

We will now deal with eigenvalues, eigenvectors, and eigenspaces of matrices. From the previous chapter, we know that matrices can be viewed as linear transformations. Eigenvalues, eigenvectors, and eigenspaces deal with special cases of transformations where the direction of the vector is preserved after the operation.

Eigenvalues and eigenvectors are important because they let us quickly find the steady state of a system, i.e. how the system looks in the long run. They are also invariant under changes of basis.

Fun fact: Google’s PageRank algorithm is based on eigenvalues and eigenvectors!

Quick overview of eigen-everything:

Definitions

An eigenvector of a linear transformation is a non-trivial vector that is only scaled when a linear transformation is applied to it. This means that the direction of the eigenvector is preserved after the operation.

In mathematical notation: Let $\vec{v}$ be an eigenvector of the matrix $\textbf{A}$ . Then $\textbf{A}\vec{v} = \lambda\vec{v}$ for a unique $\lambda \in \mathbb{R}$ .

This equation says that the transformed vector $\textbf{A}\vec{x}$ is just a scaled version of the original vector $\vec{v}$ .

We also define the eigenvector to be non-trivial because $\textbf{A}\vec{0} = \vec{0}$ for all matrices.

The eigenvalue associated with this eigenvector is the scalar quantity by which the eigenvector is scaled. In the above equation, $\lambda$ would the eigenvalue associated with the vector $\vec{v}$ . The eigenvalue associated with an eigenvector can be 0.

The eigenspace of an eigenvalue is the span of all eigenvectors with this eigenvalue because any linear combination of the eigenvectors is still an eigenvector with the same eigenvalue. In the above equation the eigenspace $E_{\lambda} = \text{span}\{\vec{v}\}$ because there is one eigenvector $\vec{v}$ corresponding to $\lambda$ .

Proof: Let $\vec{v}$ be an eigenvector of the matrix $\textbf{A}$ . Then $\textbf{A}\vec{v} = \lambda \vec{v}$ for a unique $\lambda$ . Now, let’s scale $\vec{v}$ by a constant $c$ . $\textbf{A}(c\vec{v}) = c(\textbf{A}\vec{v}) = c(\lambda \vec{v}) = \lambda (c\vec{v})$ Therefore, $\lambda$ is still the associated eigenvalue of $c\vec{v}$ .

Calculation

Let us derive a method for solving for the eigenvalues, eigenvectors, and eigenspaces of a given matrix.

We know that for a given eigenvector $\vec{v}$ , $\textbf{A}\vec{v} = \lambda\vec{v}$

However, we have 2 unknowns: $\vec{v}$ and $\lambda$ , so we try to eliminate one of them.

We can re-write the equation as $\textbf{A}\vec{v} = \lambda\textbf{I}\vec{v}$

where $\textbf{I}$ is the identity matrix. We then subtract $\textbf{A}\vec{v}$ from both sides.

$\lambda\textbf{I}\vec{v} - \textbf{A}\vec{v} = \vec{0}$

Factoring out the $\vec{v}$ , we get

$(\lambda\textbf{I} - \textbf{A})\vec{v} = \vec{0}$

We observe that $\vec{v}$ is in the null space of the matrix $\lambda\textbf{I} - \textbf{A}$ . Since we know that any eigenvector $\vec{v}$ is non-trivial, we have to make sure that the null space $\lambda\textbf{I} - \textbf{A}$ is non-trivial as well.

Recall the Invertible Matrix Theorem, which states that if the null space of a matrix is non-trivial, its determinant must be 0. Therefore, to solve for the eigenvalues, we use the equation $\text{det}(\lambda\textbf{I} - \textbf{A}) = 0$

$\text{det}(\lambda\textbf{I} - \textbf{A}) = 0$ is called the the characteristic polynomial of $\textbf{A}$ because it is a polynomial in $\lambda$ . To find the eigenvalues of a matrix, we find the roots of the characteristic polynomial $\text{det}(\lambda\textbf{I} - \textbf{A}) = 0$ .

Sometimes, an eigenvalue $\lambda_i$ can appear multiple times as a root. We define the algebraic multiplicity of the eigenvalue $\lambda_i$ to be the number of times $\lambda_i$ appears as a root in the characteristic polynomial.

Notes:
Observe what happens if an eigenvalue of $\textbf{A}$ is 0. Then, $\text{det}(0\textbf{I} - \textbf{A}) = \text{det}(-\textbf{A}) = 0 \implies \text{det}(\textbf{A}) = 0$ . Therefore, a matrix $\textbf{A}$ has an eigenvalue of 0 if and only if $\textbf{A}$ is non-invertible.
If $\textbf{A}$ is a triangular matrix, we can easily find the eigenvalues of $\textbf{A}$ . Assume that $d_1, d_2, \ldots, d_n$ are the entries along the diagonal in such a matrix. From cofactor expansion, we know that $\text{det}(\textbf{A}) = d_1d_2\ldots d_n$ . This means that $\text{det}(\lambda\textbf{I} - \textbf{A}) = (\lambda - d_1)(\lambda - d_2)\cdots(\lambda - d_n) = 0$ , so $d_1, d_2, \ldots, d_n$ are the roots of the characteristic polynomial, i.e. the eigenvalues. The eigenvalues of a triangular matrix are its diagonal entries.

Once we have all of the eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_n$ of a $\textbf{A}$ , we can find the eigenvectors associated with each eigenvalue by using the equation $(\lambda_i\textbf{I} - \textbf{A})\vec{v} = \vec{0}$ . In fact, we are basically finding the basis vectors for the null space of $\lambda_i\textbf{I} - \textbf{A}$ for $i = 1, 2, \ldots, n$ .

If the dimension of the null space of $\lambda_i\textbf{I} - \textbf{A}$ for an eigenvalue $\lambda_i$ is larger than 1, we will expect to have multiple eigenvectors associated with the same eigenvalue $\lambda_i$ .

We therefore define the geometric multiplicity of an eigenvalue to be the dimension of the associated null space of $\lambda_i\textbf{I} - \textbf{A}$ .

Example

Let us try to find the eigenvalues, eigenvectors, and eigenspaces of the matrix $\textbf{A} = \begin{bmatrix}1 & -1 & 2 \\ 0 & 0 & 2 \\ 0 & -1 & 3\end{bmatrix}$ .

We first find the roots of the characteristic polynomial.

$\text{det}(\lambda\textbf{I} - \textbf{A}) = \text{det}\left(\begin{bmatrix}\lambda & 0 & 0 \\ 0 & \lambda & 0 \\ 0 & 0 & \lambda\end{bmatrix} - \begin{bmatrix}1 & -1 & 2 \\ 0 & 0 & 2 \\ 0 & -1 & 3\end{bmatrix}\right) = \text{det}\left(\begin{bmatrix}\lambda - 1 & 1 & -2 \\ 0 & \lambda & -2 \\ 0 & 1 & \lambda - 3\end{bmatrix}\right) = 0$

Using cofactor expansion along the first column, we get

$(\lambda - 1)(\lambda(\lambda - 3) + 2) = 0$

$(\lambda - 1)(\lambda^2 - 3\lambda + 2) = 0$

$(\lambda - 1)(\lambda - 1)(\lambda - 2) = (\lambda - 1)^2(\lambda - 2) = 0$

We see that $\textbf{A}$ has the eigenvalues $\lambda_1 = 1$ with algebraic multiplicity of 2 and $\lambda_2 = 2$ with algebraic multiplicity of 1.

We then find the basis vectors for the null space of $\lambda_i\textbf{I} - \textbf{A}$ for each eigenvalue.

$\lambda_1 = 1$ :

$\text{Null}(\textbf{I} - \textbf{A}) = \text{Null}\left(\begin{bmatrix}0 & 1 & -2 \\ 0 & 1 & -2 \\ 0 & 1 & -2\end{bmatrix}\right) = \text{span}\left\{\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}\right\}$

Therefore, the eigenvectors associated with $\lambda_1 = 1$ are $\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}$ and $\begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}$ . Since there are 2 eigenvectors associated with $\lambda_1 = 1$ , we say that the eigenvalue $\lambda_1 = 1$ has a geometric multiplicity of 2. The eigenspace corresponding to $\lambda_1 = 1$ is $E_1 = \text{span}\left\{\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}\right\}$ .

$\lambda_2 = 2$ :

$\text{Null}(2\textbf{I} - \textbf{A}) = \text{Null}\left(\begin{bmatrix}1 & 1 & -2 \\ 0 & 2 & -2 \\ 0 & 1 & -1\end{bmatrix}\right) = \text{Null}\left(\begin{bmatrix}1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 0\end{bmatrix}\right) = \text{span}\left\{\begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}\right\}$

$\begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}$ is the eigenvector associated with $\lambda_2 = 2$ , which has a geometric multiplicity of 2. $E_2 = \text{span}\left\{\begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}\right\}$ .

Diagonalization

What and why

Imagine a matrix $\textbf{A}$ represented a state transition matrix, and we want to find out the steady state. We would have to calculate $\lim_{k \to \infty} \textbf{A}^k$ , but matrix multiplication is a very tedious process. If we could express $\textbf{A}$ as a product of 3 matrices, $\textbf{P}\textbf{D}\textbf{P}^{-1}$ , where $\textbf{P}$ is an invertible matrix and $\textbf{D}$ is a diagonal matrix, then we could easily find the limit. Assume $\textbf{A} = \textbf{P}\textbf{D}\textbf{P}^{-1}$ , that is $\textbf{A}$ is diagonalizable.

$\textbf{A}^k = (\textbf{P}\textbf{D}\textbf{P}^{-1})^k = \textbf{P}\textbf{D}\textbf{P}^{-1}\textbf{P}\textbf{D}\textbf{P}^{-1} \cdots \textbf{P}\textbf{D}\textbf{P}^{-1} = \textbf{P}\textbf{D}(\textbf{P}^{-1}\textbf{P})\textbf{D}(\textbf{P}^{-1}\textbf{D}\textbf{P}) \cdots (\textbf{P}^{-1}\textbf{D}\textbf{P})\textbf{D}\textbf{P}^{-1} = \textbf{P}\textbf{D}^k\textbf{P}^{-1}$

Calculating $\textbf{D}^k$ is very easy; in fact, for any diagonal matrix $\textbf{D} = \begin{bmatrix}d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n\end{bmatrix}$ , $\textbf{D}^k = \begin{bmatrix}d_1^k & 0 & \cdots & 0 \\ 0 & d_2^k & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n^k\end{bmatrix}$ , so we just need to find the $k$ -th power of each diagonal entry in $\textbf{D}$ .

Calculation

Let $\textbf{A} \in \mathbb{R}^{n \times n}$ , $\textbf{D} = \begin{bmatrix}d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n\end{bmatrix}$ , and $\textbf{P} = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}$ .

$\textbf{A} = \textbf{P}\textbf{D}\textbf{P}^{-1}$

$\textbf{A}\textbf{P} = \textbf{P}\textbf{D}$

$\textbf{A}\begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n \\ \end{bmatrix} = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}\begin{bmatrix}d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n\end{bmatrix}$

We then get $n$ equations:

$\textbf{A}\vec{p}_1 = d_1\vec{p}_1$

$\textbf{A}\vec{p}_2 = d_2\vec{p}_2$

$\vdots$

$\textbf{A}\vec{p}_n = d_n\vec{p}_n$

We recognize these equations as the definitions of eigenvectors and eigenvalues. Therefore, the diagonal matrix $\textbf{D}$ is just a matrix with the eigenvalues of $\textbf{A}$ along its diagonal and the invertible $\textbf{P}$ is just a matrix with the eigenvectors of $\textbf{P}$ as column vectors.

If $(\lambda_1, \vec{v}_1), (\lambda_2, \vec{v}_2), \ldots, (\lambda_n, \vec{v}_n)$ are the eigenpairs of the matrix $\textbf{A}$ , then $\textbf{D} = \begin{bmatrix}\lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n\end{bmatrix}$ and $\textbf{P} = \begin{bmatrix}\vec{v}_1 & \vec{v}_2 & \cdots & \vec{v}_n\end{bmatrix}$ .

Note: Order matters! The eigenvector $\vec{v}_i$ in the $i$ -th column of $\textbf{P}$ must correspond to the eigenvalue $\lambda_i$ in the $i$ -th column of $\textbf{D}$ .

We would then need to find $\textbf{P}^{-1}$ by finding the inverse of $\textbf{P}$ .

Let's now see what happens if we want to calculate $\textbf{A}^k$ . For this, we will use the row vectors of $\textbf{P}^{-1}$ , which we will define as $\textbf{P}^{-1} = \begin{bmatrix}\vec{q}_1^T \\ \vec{q}_2^T \\ \vdots \\ \vec{q}_n^T\end{bmatrix}$ . We know that

$\textbf{A}^k = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}\begin{bmatrix}\lambda_1^k & 0 & \cdots & 0 \\ 0 & \lambda_2^k & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n^k\end{bmatrix}\begin{bmatrix}\vec{q}_1^T \\ \vec{q}_2^T \\ \vdots \\ \vec{q}_n^T\end{bmatrix} = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}\begin{bmatrix}\lambda_1^k\vec{q}_1^T \\ \lambda_2^k\vec{q}_2^T \\ \vdots \\ \lambda_n^k\vec{q}_n^T\end{bmatrix}$

$\textbf{A}^k = \lambda_1^k\vec{p}_1\vec{q}_1^T + \lambda_2^k\vec{p}_2\vec{q}_2^T + \cdots + \lambda_n^k\vec{p}_n\vec{q}_n^T$

We could express $\textbf{A}^k$ as a sum of matrices $\vec{p}_i\vec{q}_i^T$ scaled by the respective eigenvalue to the $k$ -th power $\lambda_i^k$ . If $\lambda_i < 1$ , then $\lim_{k \to \infty} \lambda_i^k = 0$ , so we could basically eliminate the matrix associated with $\lambda_i < 1$ from the above equation because its scaling factor $\lambda_i^k$ approaches $0$ .

Example

Taking our previous example, we have the following eigenvalues and eigenvectors.

$\lambda_1 = 1: \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}$

$\lambda_2 = 2: \begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}$

We can therefore diagonalize $\begin{bmatrix}1 & -1 & 2 \\ 0 & 0 & 2 \\ 0 & -1 & 3\end{bmatrix}$ as $\textbf{P}\textbf{D}\textbf{P}^{-1}$ , where $\textbf{D} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2\end{bmatrix}$ and $\textbf{P} = \begin{bmatrix}1 & 1 & 1 \\ 0 & 2 & 1 \\ 0 & 1 & 1\end{bmatrix}$ . Finally, we have to calculate $\textbf{P}^{-1}$ .

Also note that there are multiple ways to diagonalize the above matrix. We could have just as well diagonalized it where $\textbf{D} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{bmatrix}$ and $\textbf{P} = \begin{bmatrix}1 & 1 & 1 \\ 2 & 1 & 0 \\ 1 & 1 & 0\end{bmatrix}$ . Just make such that the eigenvector in the first column of $\textbf{P}$ is associated with the eigenvalue in the first column of $\textbf{D}$ , that the eigenvector in the second column of $\textbf{P}$ is associated with the eigenvalue in the second column of $\textbf{D}$ , etc.

Diagonalizability

We note that a matrix $\textbf{A} \in \mathbb{R}^{n \times n}$ is only diagonalize if it is square and has exactly n eigenvectors. This leads to two conclusions about the diagonalizability of a matrix $\textbf{A}$ :

If $\textbf{A}$ has $n$ unique eigenvalues, then it must have $n$ eigenvectors, so it is automatically diagonalizable.
If the sum of the geometric multiplicities of all eigenvalues of $\textbf{A}$ equals $n$ , i.e. $\textbf{A}$ has $n$ eigenvectors, then $\textbf{A}$ is diagonalizable.

Note: Even if the sum of the algebraic multiplicities equals $n$ , this does not mean the $\textbf{A}$ has $n$ eigenvectors. Take $\textbf{A} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}$ for example. The characteristic equation is $(\lambda - 1)^2 = 0$ , so $\lambda_1 = 1$ has an algebraic multiplicity of 2. However, $\text{Null}\left(\begin{bmatrix}0 & 1 \\ 0 & 0\end{bmatrix}\right) = \text{span}\left\{\begin{bmatrix}1 \\ 0\end{bmatrix}\right\}$ , so the geometric multiplicity of $\lambda_1 = 1$ is only 1. In general, $\text{number of columns in } \textbf{A} \geq \text{sum of algebraic multiplicities} \geq \text{sum of geometric multiplicities}$

PreviousExample Problems NextDescription

Last updated 5 years ago

Was this helpful?