Linear Algebra: Invariant Subspaces and Why I Think They Are Pret Cool.

July 7, 2010

We’ve been talkin’ a lot about group theory and the beginnings of topology, but we really should go back to linear algebra for just a bit and talk about some important concepts!  Secretly, we will be applying these when we go over representation theory, but, for now, they’re just going to be points of interest.

Little Review.

First, let’s review: a vector space is this really nice space that has a whole bunch of nice additive and scalar multiplicative properties, as well as a few other coo’ things.  We usually call our vector spaces V or W or U.  Pret much any capital italics letter around the end of the alphabet will do.  A subspace, U of the vector space V is a subset of V which is also a vector space.  We don’t have very good notation, so I usually write “U\subseteq V is a subspace.”

So, now we have maps between vector spaces.  If you forget what all of this is about, go back to here and read up!  These maps are called linear maps, and they have a really nice property: if T is a linear map and x,y\in V and a,b\in F (where F is our underlying field), then we have:

  • T(ax) = aT(x)
  • T(x + y) = T(x) + T(y)

or, more concisely, T(ax + by) = aT(x) + bT(y).

Of course, T needs to go from a vector space to another space, and we write that like this: T:V\rightarrow W meaning T goes from the vector space V to the vector space W.

Invariant Subspaces.

Now, what happens if we have an element x\in V and we want to apply a linear map T to it?  Actually, let’s given an example of this: let’s have our vector space be {\mathbb R}^3 and let’s let our linear map T:{\mathbb R}^3\rightarrow {\mathbb R}^3 and be defined by

T(x_1, x_2, x_3) = (x_{2}, x_{1}, x_{3})

So, T switches the first and second coordinate.  You can check that this is actually linear.  What if we want to apply this to the element (2,3,4)?  Well, this is relatively easy: we have T((2,3,4)) = (3,2,4).

But we can think of this some other way: we can say that

(2,3,4) = (2,0,0) + (0,3,0) + (0,0,4)

which is, in turn,

(2,3,4) = 2(1,0,0) + 3(0,1,0) + 4(0,0,1)

This is now written in terms of a basis, and we can compute this by applying our linear map to the basis.  Yeah, this makes it way more complicated looking, but now we can easily express every element in terms of something like this.

Now, how can we talk about this?  We can say that, applying T to {\mathbb R}^3 gives us something that “switches” the first and second coordinate.  What about the third coordinate?  It stays the same!  So, we could really break up this vector space into two parts:

{\mathbb R^{2}} \oplus {\mathbb R}

where this last {\mathbb R} isn’t really going to be affected by the transform.  Therefore, we can really just define this transform on the first two parts of the vector space and just kind of translate the last part over via the identity map.

Now, this seems really trivial, right?  But what if we had a general vector space and a really complicated map?  Wouldn’t you want to break it down as much as possible?  Of course you would!  In the previous example, we noticed that T was kind of “closed” in the first two coordinates — yes, they swapped with each other, but the third coordinate didn’t even come into play!  It’s almost like having two mixing bowls and putting eggs and flour into one and water into the other: the water doesn’t go into the eggs and flour, and the eggs and flour keep to themselves as well.  This is the notion of an invariant subspace.

Specifically, an invariant subspace (also called an invariant subspace with respect to the linear map T) is a subspace U\subseteq V such that T(U) \subseteq U.  In other words, after applying a linear map to U, every element stays in U.

Now, wouldn’t it look really nice if we had, for our vector space V, some kind of partition

V = U_{1}\oplus U_{2}\oplus \dots\oplus U_{n}

such that

T(V) = T(U_{1})\oplus T(U_{2})\oplus \dots\oplus T(U_{n})

Wouldn’t that be cute?  Because the map would act on the spaces in the same sort of way that it would act on the elements?  Yeah, well, sorry to rain on your parade, but it actually doesn’t work as nicely as that all the time.

So, let’s dry our eyes and think about what we can do in general.  Let’s think about the simplest kind of invariant subspace, shall we?  If a vector space is, say, 5 dimensions, what would we ideally want to break it up into?  Probably five 1-dimensional subspaces, right?  In this way, we’d be able to operate on each one individually, and our lives would be so damn good.  We’d live like kings.  One-dimensional subspaces (like one-dimensional men) are extremely easy to work with, and, therefore, they are the topic of the first part of our investigation of invariant subspaces.

One-Dimensional Invariant Subspaces!

But what would this mean?  Suppose we had a one-dimensional subspace.  We could obviously find a basis for it: it’s simply some element u\in U where U is the subspace (since our basis must consist of one element, as U is one-dimensional).  Now, we have that U = \{au\, |\, a\in F\} for F is our field.  What does this say?  It says that U is generated by all scalar multiples of u.  This says exactly the same thing as “\{u\} is a basis for U“, but this actually gives us a nice way to manipulate elements with a linear map.

Suppose, then, that we have some map T:V\rightarrow V, and we have that U is invariant under T.  Good.  This means that T(au) = aT(u)\in U.  But, wait, U is generated by u, so, in fact, T(au) = aT(u) = bu for some b\in F (Why is this true?).  Cool.  So, T(u) = a^{-1}bu.  Wait, hold on a second.  This looks familliar.  Let’s just let \lambda = a^{-1}b for convenience, and then we have T(u) = \lambda u.  WAIT.  So, this means that u is an eigenvector with corresponding eigenvalue \lambda!

Okay, okay, don’t get too excited.  We have just figured out that if there is a one-dimensional invariant subspace in V, then it is generated by an eigenvalue of V.  That’s pretty important.  But, let’s try this the other way around.

Suppose that u\in V is an eigenvalue of V with respect to the linear map T:V\rightarrow V.  This means that T(u) = \lambda u.  So consider the subspace of V defined by \{au\, |\, a\in F\}.  Is this invariant under T?  You bet your sweet meats it is!  Why?  Take an arbitrary element, au, and apply T to it: T(au) = aT(u) = a(\lambda u) = (a\lambda)u\in \{au\, |\, a\in F\}.  Therefore, this subspace is invariant.  We have proved something pretty damn sweet.

Theorem:  If V is a finite-dimensional vector space and T:V\rightarrow V is a linear map, then \{au\, |\, a\in F\} is an invariant subspace under T if and only if u is an eigenvector of V.

Proof: See above.

Well, this is pretty damn impressive.  We know how to find eigenvectors of a map.  From this, we can directly decompose V into little pieces.  In fact, suppose we have a bunch of invariant subspaces generated by the eigenvectors v_{1}, v_{2}, \dots, v_{n}.  Do any of these “overlap”?  In other words, is the intersection ever nontrivial?  This is the same as asking if the set of eigenvectors is linearly independent in V (why?), and so this is exactly what we’re going to prove right now!

Theorem:  For V is a finite-dimensional vector space and T:V\rightarrow V is a linear map, then if \{\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}\} if the set of distinct eigenvalues corresponding to the set of (non-zero) eigenvectors \{v_{1}, v_{2}, \dots, v_{n}\}, then this set of eigenvectors is linearly independent in V.

Proof: This clever proof is characteristic of theorems involving linear independence.  The plan is to assume linear dependence, get some element as a combo of the others, and then apply some stuff to get a contradiction somehow.  Let’s begin.

Let’s assume that \{v_{1}, \dots, v_{n}\} is linearly dependent.  Then let’s let k be the smallest integer such that v_{k} \in Span(v_{1}, \dots, v_{k-1}).  In other words, we have that v_{k} is the smallest indexed eigenvector such that it’s a linear combination of all the eigenvectors with indexes smaller than it.  Okay.  Okay.  This means, in particular, that v_{k} = a_{1}v_{1} + a_{2}v_{2} + \dots + a_{k-1}v_{k-1}.  Okay, now, (here’s the clever step) apply T to both sides.  We get:

T(v_{k}) = T(a_{1}v_{1} + a_{2}v_{2} + \dots + a_{k-1}v_{k-1})\\ \lambda_{k} v_{k} = a_{1}\lambda_{1} v_{1} + a_{2}\lambda_{2} v_{2} + \dots + a_{k-1}\lambda_{k-1} v_{k-1}

Make sure you see how we got this!  That’s an important part.  Okay, now we have this equation that we just derived:

\lambda_{k} v_{k} = a_{1}\lambda_{1} v_{1} + a_{2}\lambda_{2} v_{2} + \dots + a_{k-1}\lambda_{k-1} v_{k-1}

and we also have the original equation:

v_{k} = a_{1}v_{1} + a_{2}v_{2} + \dots + a_{k-1}v_{k-1}

but, if we take the original equation and multiply \lambda_{k} to both sides, we get

\lambda_{k} v_{k} = a_{1}\lambda_{k} v_{1} + a_{2}\lambda_{k} v_{2} + \dots + a_{k-1}\lambda_{k} v_{k-1}

which is similar to the equation above!  Okay, okay, calm down.  Now, subtract this equation from the one we derived right before it (so, both of those equations that have \lambda_{k} v_{k} on the left side.) and we’ll get:

0 = a_{1}(\lambda_{1} - \lambda_{k})v_{1} + \dots + a_{k-1}(\lambda_{k-1} - \lambda_{k})v_{k-1}

Make sure you see where we got this!  This is an important step.  Now, look at this: it’s a linear combination of \{v_{1}, \dots, v_{k-1}\} which is linearly independent (by assumption; otherwise, there’d be a smaller k such that v_{k} \in Span(v_{1}, \dots, v_{k-1}) and so, if this linear combination is equal to zero, it implies that all of the coefficients are equal to 0.  Cool.  so, for all i<k we have:

(\lambda_{k} - \lambda{i})a_{i} = 0

and note that, by assumption, \lambda_{i} \neq \lambda_{j} for i\neq j, so we have that every a_{i} = 0, which implies that (using our original equation) v_{k} = 0.  But all of the eigenvectors, by assumption, are non-zero!  We have reached a contradiction.  \Rightarrow\Leftarrow\Box.

This means, in particular, if we have V and we have \{v_{1}, v_{2}, \dots, v_{r}\} for some r\leq dim(V), then by using that theorem above, we have a whole bunch of subspaces that are invariant!  In particular, let’s call U_{i} = \{av_{i} | a\in F\}, and then we have that each of these eigenvectors creates some one-dimensional invariant subspace.  Meaning that,

V = (U_{1} \oplus U_{2} \oplus \dots \oplus U_{r}) \oplus W

for some W.  Let’s take a breather here and just think about this.  Okay, cool, we partitioned this space up into one-dimensional spaces with eigenvector bases.  Nice.  Only one thing kind of bugs me though…

…what’s that W on the end?  Well, it isn’t anything particularly nice.  We just sort of build it.  This is a normal kind of thing that we  do in linear algebra, so let’s do it explicitly:

We wanna express V as a direct sum of these things, so since the eigenvectors are linearly independent and span exactly U_{1}\oplus U_{2} \oplus\dots\oplus U_{r}, we can let these be the beginnings of a basis for V.  So, now, consider V without all of the elements in Span(v_{1}, \dots, v_{r}).  If there’s some non-zero element left, call it w, pick it, and make it a new basis element, in addition to all of the eigenvectors.  Take out all vectors in the space Span(v_{1}, \dots, v_{n}, w), and then if there’s anything left in V minus all this, pick that element out and do the same thing as we did the first time.  Eventually, we will span V since it is finite dimensional.  We get a basis that looks like \{v_{1}, \dots, v_{r}, w_{1}, \dots, w_{n-r}\}.  It follows that the subspace W above is exactly the subspace of V spanned by \{w_{1}, \dots, w_{n-r}\}.  It’s up to you to show that this is disjoint (except for the 0 element) from every one of the U_{i}‘s.  Cute.

So, What Did We Just Do?

So, given any random finite-dimensional space V and a linear map T:V\rightarrow V, we can find the eigenvectors, right?  Yeah.  Now, we’ve taken that one step farther, and partitioned V into a space that looks like U_{1} \oplus U_{2} \oplus \dots \oplus U_{r} \oplus W with the eigenvectors as basis elements each generating (spanning) each of the U_{i} spaces.  In addition, as a bonus, each of those one-dimensional U_{i} spaces is invariant under T.  It is, therefore, the case that we have the very cute and very familiar relation:

T(V) = T(U_{1} \oplus U_{2} \oplus \dots \oplus U_{r} \oplus W)

= T(U_{1}) \oplus \dots \oplus T(U_{r}) \oplus T(W).

which is significantly easier to work with.

What Now?

After this, we work with spaces that don’t have nice eigenvectors, and we show that some spaces must have one- or two-dimensional invariant subspaces.  Eventually, we’re gonna do some examples of such spaces, and we’ll even show some cool stuff with this.  For now, you shouldn’t really have too good’a grip on what this “says” — it’s somewhat abstract and kind of confusing.  On the other hand, this gives us a really good reason to study eigenvectors and gives us a nice way of thinking about linear transforms.  In particular, some linear transforms look very strange if we think about the basis being the standard basis, but, on the other hand, if we switch to an eigenvector basis, it may not look too strange after all!  Examples to come.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: