Description
Let's try to understand how least squares works in one dimension.

In this example, we're trying to predict a based off of b. However, a has components that are not in span{b}, making it impossible to estimate a completely accurately. Our best estimate would be a1, i.e. the component of a that is in span{b}. The error between a and a1 is denoted by a2, which is perpendicular to b as that minimizes its norm and the error.
We know that a1 must be a multiple of b as it is in span{b}. This means that a1=xb where x is a constant. Let's try solving for x. Since a2 is orthogonal to b, we know that their dot product is 0. Hence,
a2⋅b=(a−a1)⋅b=(a−xb)⋅b=0
Rearranging this equation to solve for x gives us x=(aTa)−1aTb.
This same concept can be applied to higher dimensions as well by simply replacing a
by A where
A=[a1⋯an]
The general form for least squares is therefore
ATAx=ATb or x=(ATA)−1ATb.
Note: Refer to the lecture notes to see the general derivation.
Last updated
Was this helpful?