Recently, I’ve been learning to program in a new language and have been doing Project Euler problems for practice — of course, as the name suggests, most of these problems deal explicitly with problems which must be solved (efficiently) with mathematical techniques.  Two of the most common algorithms I’ve used are: Prime Testing, GCD Finder.  I’ll post about the former later, but the latter is an interesting problem in its own right:


Initial Problem.  Given two natural (positive whole) numbers called m, n, can we find some other natural number that divides both of them?


This problem is a first step.  It’s nice to be able to write numbers as a multiple of some other number; for example, if we have 16 and 18, we may write them as 8\times 2 and 9\times 2, thus giving us some insight as to the relationship between these two numbers. In that case it may be easy to see, but perhaps if you’re given the numbers 46629 and 47100, you may not realize right away that these numbers are 99\times 471 and 100\times 471 respectively.  This kind of factorization will reveal "hidden" relationships between numbers.  

So, given two numbers, how do we find if something divides both of them — in other words, how do we find the common divisors of two numbers?  If we think back to when we first began working with numbers (in elementary school, perhaps) the first thing to do would be to note that 1 divides every number.  But that doesn’t help us all that much, as it turns out, so we go to the next number: if both numbers are even, then they have 2 as a common factor.  Then we "factor" both numbers by writing them as 2\times\mbox{ something} and then attempt to keep dividing things out of the something.  We then move onto 3, skip 4 (since this would just be divisible by 2 twice), go onto 5, then 7, then…and continue for the primes.  This gives a prime factorization, but we have to note that if, say, 2 and 5 divide some number, then so does 10.  These latter divisors are the composite factors.

This seems excessive, but it is sometimes the only way one can do it. 

Anecdote!: On my algebra qualifying exam, there was a question regarding a group of order 289 which required us to see if 289 was prime or not; if not, we were to factor it.  We were not allowed calculators, so what could we do?  Try everything.  Note that we only need to try up to the square root of the number (which we could estimate in other ways), but it’s still a number of cases.  If you check, none of the following numbers divide into 289: 2, 3, 5, 7, 11, 13.  At this point, I was about to give up and call it a prime, but, for whatever reason, I decided to try 17.  Of course, as the clever reader will have pointed out, 289 = 17\times 17.  It is not prime.  There was, luckily, only one student who thought it was prime, but it points out how the algorithm above is not entirely trivial if one does not have access to a computer or calculator. 


Once we have a common divisor, or a set of common divisors, a natural thing to want to do is to find the biggest (we already have the smallest, 1) since in this way we can write our numbers with the largest common factor multiplied by some other number.  It will, in effect, make things prettier.


Real Problem.  Find the greatest divisor which is common to two natural numbers, m, n.


If you were just learning about this kind of thing, you may spout out the following solution: find all of the common divisors, then pick the greatest.  While this is not especially efficient, it is a solution.  Unfortunately, even for small numbers, this gets out of hand quickly.  For example, 60 and  420 have the following common divisors: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60.  This takes a while to compute by hand. 

Even if we were to find prime factors, this would be 60 = 2^2 \times 3\times 5 and 420 = 2^2 \times 3 \times 5\times 7, which gives us that they share a number of prime factors.  A bit of thinking gives us that we take all of the prime factors they "share" and multiply them together to get the greatest common divisor.  This is another potential solution which is much faster than simply listing out all of the common divisors.  Unfortunately, this falls prey to the same kind of trap that other prime-related problems do: it is, at times, especially difficult to factor large composite numbers.  For example, the "reasonably small" number 49740376105597 has a prime factorization of 2741 \times 37813 \times 479909; this is not at all efficient to factor if one does not have a computer or a specialized calculator with a factoring algorithm on it.  As a mean joke, you may ask your friend to factor something like 1689259081189, which is actually the product of the 100,000th and 100,001st prime — that is, they would need to test 99,999 primes before getting to the one which divides the number.  If they divided by one prime per second (which is quite fast!) this would take them 1 day, 3 hours, and 46 minutes.  Not especially effective, but it will eventually get the job done.


Real Problem, With Efficiency: Find the greatest divisor which is common to two natural numbers, m,n, but do so in an efficient manner (we’ve all got deadlines!).


We need to sit down and think about this now.  We need an entirely new idea.  We note, at least, that for the two numbers m,n that one of them must be larger than the other (or else the problem is trivial).  One thing to try would be to see if the smaller one goes into the larger one (for example, above we had 60 going into 420, which gave us the easy solution that 60 must be the greatest common divisor).  If not, maybe we can see how much is left over.  That is, if m is the larger number,

m = a_{1}n + r_{1}

where here a_{1} is the number of times n goes into m without exceeding it, and r_{1} is the "remainder"; if it’s equal to 0, then n evenly divides into m, and otherwise it is less than n (or else we could divide an additional n into m). 

Using this, if r_{1}\neq 0, we may write m - a_{1}n = r_{1}; this means that, in particular, r_{1} divides m and a_{1}n, so it is a factor of m and of a_{1}n.  But it may not actually be a factor of n; so let’s see how many times it goes into n.  Using the same process…

n = a_{2}r_{1} + r_{2}

and by rearranging, we have that n - a_{2}r_{1} is divisible by r_{2}.  So, n is divisible by r_{2}, but we aren’t sure if r_{1} is divisible by r_{2}…if it were, we would be able to say that r_{2} was a common divisor of m and n (why?).  That’s something at least. 

The cool thing about our algorithm here is that, because a_{1}n + r_{1} = m we have that either r_{1} = 0 and we’re done with the algorithm, or r_{1} > 0 and we may form a new equation n = a_{2}r_{1} + r_{2}; this equation has, on the left-hand side, the number n which is less than the previous equation’s left-hand side, which was m.  Continuing this process, we will have r_{1}, r_{2}, \dots on the left-hand side, each of which is less than the one which came before it.  Because r_{i} \geq 0 for any of the remainders, eventually it will become 0 (why?) and this algorithm will terminate.  That is, we will have found some r_{i} which is a common divisor for both n, m; specifically, it will be the r_{i}\neq 0 such that r_{i+1} = 0 (or, it may simply be n if n divides m).

This algorithm, called the Euclidean Algorithm, actually does more "automatically": it not only finds a common divisor, but actually finds the greatest common divisor of m,n, which, from now on, we will denote \gcd(m,n).  The "proof" of this is simply noting that \gcd(m,n) = \gcd(n,r_{1}) = \gcd(r_{1},r_{2}) = \cdots = \gcd(r_{i-1},r_{i}) = r_{i} (we noted this above without making reference to the gcd, but the reader should attempt to go through all the same steps using the idea of the gcd). 

So.  If you have two natural numbers, n,m, you divide them, find the remainder, write the equation, then continue as above until you get a 0 remainder.  Then you pick the remainder directly before you got 0 as your gcd (or, you pick the smaller number if one number divides the other).  Pretty simple algorithm, but is it efficient?

Without going into formal "efficiency" definitions, "yes", it is quite efficient.  To prove it, let’s take an "average" example using the "large" numbers 1337944608 and 4216212.  We note that (by pen and paper, or by using a standard calculator) that

1337944608 = 317(4216212) + 1405404.

Next, we note that

4216212 = 3(1405404) + 0

which instantly gives us the solution \gcd(4216212, 1337944608) = 1405404.  That’s pretty awesome.  Note that this was an especially quick trial, but even the "worst" ones are relatively quick. 


Unexpected Corollary!:  For n,m natural numbers, if \gcd(n,m) = k then there exists integers a,b such that an + bm = k.


This is more useful than you might think at first glance, and we’ll get into why in a later post, but what’s nice about this corollary is that it comes "for free" from the Euclidean algorithm.  Note that, since k divides n, m, it suffices to prove this corollary for an + bm = 1 where n, m have \gcd(n,m) = 1.  The proof uses induction on the number of steps of the Euclidean algorithm for those numbers, but for those of you who are more experienced and know modular arithmetic, you may enjoy the following simple proof:


"Clever" Proof of the Corollary: Let m > n (for equality, the proof is easy).  We will only care about remainders in this proof, so we will look at some numbers modulo m.  Consider


r_{1} = n\mod m

r_{2} = 2n\mod m


r_{m-1} = (m-1)n\mod m


Note there are exactly m-1 remainders here and that the remainder 0 never occurs (since m,n are relatively prime).  Suppose that r_{i} \neq 1 for each of the i; that is, the remainder 1 does not ever show up in this list.  By the pigeon-hole principle (as there are m - 1 remainders but only m -2 possible values for the remainders) we must have that r_{i} = r_{j} for some i\neq j.  That is, we have

in \mod m = jn \mod m

which implies

(i-j)n\mod m = 0

but this is impossible, since it implies that either m = 0 or n is some integer multiple of m, but m > 0 and we have assumes m,n are relatively prime.  Hence, the remainder 1 must occur.  That is, r_{c} = 1 for some c and

cn \mod m = 1.

But what does this mean?  It means that there is some integer a such that am - cn = 1.  To make this prettier, let b = -c and we find that there exists a,b integers such that am + bn = 1, as required.  \Box


Pretty slick, no?


Let’s talk about Burnside’s theorem.  First, let me note that there are two results commonly called "Burnside’s Theorem."  The first one that I learned (which we won’t be discussing in this post) was:


Theorem (Burnside).  If G is a finite group of order p^{a}q^{b} where a,b are non-negative integers and where p,q are primes, then G is solvable.


The second one is also a group theoretical result, but a bit more combinatorial-feeling.  In some books (and, apparently, Wikipedia) this second result is called Burnside’s Lemma.  As noted in the Wikipedia article, this theorem was not even due to Burnside, who quoted the result from Frobenius, who probably got it from Cauchy. 

Let’s get some definitions down.  As usual, we’ll denote the order of the group by |G|, and our groups will all be finite in this post.  If we have a group G which acts on a set X, then given some fixed g\in G we define the set Fix_{X}(g) = \{x\in X\,|\, g\cdot x = x\}; this is, of course, the fixed points in X when acted on by the element g; for the remainder of the post, we will simply write Fix(g) with the set X implied.  Remember, when a group acts on X, the "product" will sit inside of X, and we write the action as g\cdot x for an element g\in G acting on an element x\in X.  The orbit of a point x\in X when acted on by G is given by \{g\cdot x\,|\, g\in G\}; we’ll denote this Orb(x), though this is not standard notation.  The orbit, essentially, is all of the possible values y\in X that you can get to by acting on x with elements in your group.

One thing to note, also, is that orbits are pairwise disjoint.  You should prove this to yourself if you haven’t already, but the idea is like this: if A,B are orbits of elements in X then suppose z\in A\cap B; then there is some x\in A, y\in B, g,h\in G such that g\cdot x = h\cdot y = z, but this implies (h^{-1}g)\cdot x = 1\cdot y = y which implies the orbits are identical (why?).  Hence, each element of X is in exactly one orbit


We need one more result before we can sink our teeth into Burnside.  Remember the fixed point set above?  This was all of the elements x such that g\cdot x = x for some g.  There’s a similar notion called a Stabilizer, denoted G_{x} = \{g\in G\,|\, g\cdot x = x\}; this is saying that we first fix x\in X, and then look at all the elements g\in G which stabilize it.  These definitions are pretty similar feeling (almost like family!) and, in fact, there is a nice relation between the two:


Notation.  Let X/G denote the set of orbits of X when acted on by G; when X is a group and G is a subgroup this is the same as a quotient.

Theorem (Orbit-Stabilizer Theorem).  There is a bijection between G/G_{x} and Orb(x)


That is, if we act on G by elements of g which fix x, then we will have the same number of elements as the orbit of x.  This might seem a little confusing at first, but if you work through it, it’s not so weird. 


Sketch of the Proof.  (Skip this if you’re not comfortable with all this notation above; just go down to the next theorem.)  Here, we want to show a bijection.  Notice that $G/G_{x}$ is the set of cosets hG_{x} for h\in G.  We claim that the mapping \phi which sends \phi: hG_{x} \mapsto h\cdot x is well-defined, injective and surjective (but not a homomorphism).  First, well defined: if hG_{x} = kG_{x} then $latex $hk^{-1}\in G_{x}$ which means that (hk^{-1})\cdot x = x.  This implies, after some manipulation, that h\cdot x = k\cdot x, which means these elements are identical in Orb(x).  Second, surjectivity is clear.  Last, if h\cdot x = g\cdot x in the orbit, then (g^{-1}h)\cdot x = x which implies g^{-1}h\in G_{x} which gives gG_{x} = hG_{x}; hence this map is injective.  This gives us that our map \phi is bijective.  \diamond


One immediate corollary is that |Orb(x)| = \dfrac{|G|}{|G_{x}|}; that is, the number of elements in the orbit of x is the same as the number of elements in G divided by the number of elements in g which fix x.  Think about this for a minute. 



Road to the Proof.

Okay.  Now, let’s think about something for a second.  What is the sum

\displaystyle \sum_{g\in G}|Fix(g)|

telling us?  This is the number of elements in x which are fixed by some g\in G; but there might be some overlap, since if g\cdot x = x and h\cdot x = x, then x will be counted twice: one as an element of Fix(g) and once as an element of Fix(h).  But how much overlap is there?  This is an innocent seeming question, and you might think something like, "Well, depends on how much stuff stabilizes each x.", and this is pretty close to the point. 

First, note that

\displaystyle \sum_{g\in G}|Fix(g)| = |\{(g,x)\in G\times X\,|\,g\cdot x = x\}|

which is just the long way to write out this sum; but, the nice part about that is, we can now think about this as all of the elements of G which are stabilized by some x\in X (why?).  Then,

\displaystyle \sum_{g\in G}|Fix(g)| = \sum_{x\in X}|G_{x}|.

If you don’t see this, you should prove to yourself why they’re the same sum (why is each element counted in the left-hand side also counted in the right-hand side?).  Now, by the Orbit-Stabilizer theorem above, this right-hand sum becomes pretty nice.  Specifically,

\displaystyle \sum_{x\in X}|G_{x}| = \sum_{x\in X}\frac{|G|}{|Orb(x)|} = |G|\sum_{x\in X}\frac{1}{|Orb(x)|}

where we noted in the last equality that |G| is a constant, so we may pull it out of the sum. 

Recalling that X/G denotes the number of orbits, we have that if we take a single orbit (call it A) we will be adding \frac{1}{|A|} up exactly |A| times (since the sum is taken over each x\in X so, in particular, over each $x\in A$); hence, we will add \frac{1}{|A|}\cdot |A| = 1 for each orbit we have in X/G.  That is;

\displaystyle = |G|\sum_{A\in X/G}1 = |G||X/G|.

Putting this all together, we have

\displaystyle \sum_{g\in G}|Fix(g)| = |G||X/G|.


We clean it up a bit, and state the following:


Theorem (Burnside’s).  For a finite group G and a set X, with notation as above, we have \displaystyle |X/G| = \frac{1}{|G|}\sum_{g\in G}|Fix(g)|


That is, the number of orbits is equal to the sum, over g\in G, of the elements of x fixed under g, averaged by the number of elements in G.  Kind of neat.


Next time, we’ll talk about applications!

Pick’s Theorem.

November 22, 2012


In order to teach kids about the concept of area, math teachers sometimes showed pictures which looked like



and we would be asked to "guess the area."  If this was being taught to younger students, the way to do it would be to draw little boxes inside and then try to see how many "half-boxes" were left, how many "quarter-boxes" were left, and so forth.  If this was a more sophisticated lesson, our teacher would instruct us to cut up the picture into rectangles and triangles — the one above, for example, has a right triangle on the left-hand side and a 4×2 rectangle on the right-hand side.  From this, we could easily use the formula for area for each, and we could deduce the total area from these. 

So far nothing is too difficult to solve.  But, if we adjust the picture slightly (still using only straight lines to construct our figure) we may get something like




Here, it is difficult to tell what to do.  We could try to find "easier" triangles to break this into, but none are immediately obvious.  It’s not easy to tell what the angles are (except by using some algebraic methods, finding slopes, etc.).  The best one could do would be to "guess" what the lengths of the sides were (or the lengths of the base and height of the triangle). 


Motivating Pick’s Theorem.

Before we begin, let me define some of the things we’ll be working with.   A polygon is a familiar concept, but we ought to say some things about it:

A lattice polygon is a figure on the two-dimension lattice (as above) such that all of its edges begin and end on a lattice point, no edges overlap except at a lattice point, and its boundary is a closed path (informally, this means that if we started at a lattice point and "traveled" along an edge and kept going in the same direction, we’d eventually come back to where we started).  Moreover, the polygon has to have a distinct interior and exterior (as in, it cannot look like a donut with a hole in the middle).  [Topologists, note: this means that the boundary of our figure is homeomorphic to a circle and the interior is simply connected.]


Easy Polygons.

The "easiest" polygons to work with in terms of area are rectangles: once we find the side lengths, we’re done.  Let’s look at the rectangle below, ignoring the stuff in the lower-left for now.




We can easily find the area here: it is a 7×3 rectangle, giving us an area of 21.  Notice something kind of neat, though; if we look at each interior point of the rectangle, we can associate a unit square with it (In the picture above, I’ve identified the point "a" with the first unit square "a", and the lattice point "b" with the square "b").  Using just the interior points, we can fill up most of this rectangle with unit squares, as below:




We see that we get "almost" all of the area accounted for if we only identify interior points like this; we miss the "¬" shape on the top.  But besides this weird shape, we learned that if we have a rectangle with n interior points, we will be able to fill up n units squared of area in our rectangle. 

If we do the same process (which I mark with a different color) for the boundary points on the top of the rectangle by identifying them with a unit square to their lower-left, we notice that we have to "skip" one on the far-left.




[For clarification, the second lattice point on the top of the rectangle corresponds to the first box, the third point corresponds with the second box, and so forth.  We are forced to skip one lattice point (the first one) on the top.]

Notice that if we have \ell interior points and k points on the top boundary of the rectangle, then there must be k-2 points in each row of interior points; hence, there must be \dfrac{\ell}{k-2} points in each column of interior points. 

Notice, last, that in our picture we need only fill in the remaining few parts on the right side.  We notice that the number of squares needed to fill in this side is exactly the same number of points in each column of interior points  (check this out on a few different rectangles if you don’t believe it). 


Whew.  Okay.  So.  We have \ell interior points, each corresponding to a unit square.  All but one of the top boundary points corresponds to a square; this is k - 1 unit squares.  Last, we have the number of points in each columns of the interior points corresponding to the remaining unit squares; this is \dfrac{\ell}{k-2} unit squares. 

At this point we want to write this nicely, so we’ll denote the TOTAL number of boundary points b, and note that b = 2k + 2\dfrac{\ell}{k-2}.

The total area we found was \ell + k-1 + \dfrac{\ell}{k-2}.  If we rearrange this a bit, we notice that 

\mbox{AREA} = \ell + \left(k + \dfrac{\ell}{k-2}\right) - 1 = \ell + \frac{1}{2}b - 1.

This is exactly the statement of Pick’s theorem, and it is true more generally, as we’ll see.



We’ll briefly cover one more example.  Triangles are a natural figure to think about, so let’s see if Pick’s formula works here.  First, right triangles are easy to work with, so let’s look at this one (ignore the blue part for now):




Notice that this triangle is exactly half of a rectangle (this is completed by the blue part), and it just so happens to have no lattice points on the diagonal.  This last part is important so look again: none of the dots touch the diagonal of the red triangle (here, we are excluding the vertices of the triangle, which we don’t count as being on the diagonal).  Some come close, but none lie on it.  Of course, this is not true in general, but for now let’s just look at this triangle. 

If we use the formula above for the rectangle, we get that the area is \ell + \dfrac{b}{2} - 1 for the rectangle (in this specific case, \ell = 30 and b = 26), and half of this will be \dfrac{\ell}{2} + \dfrac{b}{4} - \dfrac{1}{2}

On the other hand, if we look at the interior points of the triangle, if none of them lie on the diagonal (like above) then we have exactly half of what the rectangle had, so our triangle has \dfrac{\ell}{2} interior points; the number of boundary points will be half of the number of boundary points of the rectangle, plus 1.  This can be seen as follows: if we consider all the points on the bottom boundary and all the points on the left boundary except for the top-most point, then this is exactly half of the boundary points of the rectangle.  Hence, the number of boundary points we have for our triangle is \dfrac{b}{2} + 1

Plugging this information into Pick’s formula (which, at this point, we only know is valid for the rectangle!) we obtain: \frac{\ell}{2} + \dfrac{b}{4} - \frac{1}{2}.  This is exactly the area we calculated before, giving us a verification that Pick’s formula works for right triangles with no lattice points on the diagonal. 


How do we get around the condition that no lattice points should be on the diagonal?  There is a relatively easy way to break up right triangles into other right triangles, none of which will have points on their diagonals.  I’ll demonstrate with this picture below:


The idea is to just take the big triangle, draw some vertical and horizontal lines from the lattice points which lie on the diagonal, until you get smaller triangles (which will have no lattice points on the diagonal) and a bunch of rectangles.  In this case, I first got two triangles (a small one on top, and a small one on the bottom right) and one little 4×3 rectangle in the lower-left.  You then split the rectangle in half, which gives you some more triangles; if these triangles had lattice points on the diagonal, I would have repeated the process, getting even smaller triangles and rectangles.  Because everything is nice and finite here, we don’t get infinite numbers of rectangles and triangles and such: this process will eventually stop.  We apply the above to each and note that this verifies Pick’s formula for any right triangle.


But even right triangles are a bit restrictive.  It would be nice if it were true for any triangle.  Indeed, there is a similar process which decomposes triangles into right triangles.  In fact, there are a number of such processes: for example, prove to yourself that every triangle has at least one altitude which is entirely contained in the triangle, and note that this splits the triangle into two right triangles.  However you show it, this verifies Pick’s formula for any triangle.



The Theorem.

Theorem (Pick).  Given a lattice polygon with \ell interior points and b boundary points, the total area enclosed by the figure is given by \ell + \dfrac{b}{2} - 1


The proof of this is done by induction.  This is a more unusual type of induction since it requires us to induct on the number of triangles a figure is made up of.  The difficult part has already been completed: we have shown that the formula holds for any triangle.  We need a fact which I will not prove here:


Fact: Every polygon can be decomposed into a collection of triangles which only intersect at their boundaries.  Moreover, lattice polygons can be decomposed into a collection of lattice triangles (triangles whose vertices are lattice points) which only intersect at their boundaries.


This process is called polygon triangulation.  In layman’s terms, it means that you can take a polygon and cut it up into a bunch of triangles.  Try it yourself for a few polygons!


Given all of this, let’s jump into the proof.


Proof.  By induction.  We have already proved the formula holds for triangles, so suppose the formula holds for all polygons which are able to be decomposed into k or fewer triangles. Take such a polygon P which is able to be decomposed into exactly k triangles and "attach" a triangle T to the boundary of P such that the resulting figure is still a lattice polygon; call this new polygon Q.





For the triangle, denote its boundary points by b_{T} and its interior points by \ell_{T}; similarly, for P denote its boundary points by b_{P} and its interior points by \ell_{P}.  Denote the common points that T and P share by c.

For interior points, note that we have added the interior points of T and P together, but we also obtain those points which they share on their boundary, except for the two points on the vertex of the triangle which are shared; that is,  \ell_{Q} = \ell_{T} + \ell_{P} + (c-2)


For boundary points, we have to subtract c-2 points from P‘s boundary and  c - 2 points from T‘s boundary (for the same reason as in the previous paragraph).  If we add together the boundary points of T minus the common points and the boundary points of P minus the common points, we will be counting those two points on the vertex of the triangle which are shared two times (why?), so we need to subtract 2 so that we only count these points once.  Hence, b_{Q} = b_{P} + b_{T} - 2(c - 2) - 2 = b_{P} + b_{T} - 2c + 2


At this point, let A_{P} be the area of P and A_{T} be the area of T; we have that:


\displaystyle A_{P} + A_{T} = \left(\ell_{P} + \frac{b_{P}}{2} - 1\right) + \left(\ell_{T} + \frac{b_{T}}{2} - 1\right)

\displaystyle = (\ell_{P} + \ell_{T}) + \frac{b_{P} + b_{T}}{2} - 2


We note now that \ell_{P} + \ell_{T} = \ell_{Q} - (c - 2) and b_{P} + b_{T} = b_{Q} + 2c - 2 from above, which gives us

\displaystyle = \ell_{Q} - (c - 2) + \frac{b_{Q}}{2} + (c - 1) - 2 = \ell_{Q} + \frac{b_{Q}}{2} - 1.

This verifies Pick’s formula for our lattice polygon Q, and since any lattice polygon can be constructed this way (from finitely many triangles) this shows that Pick’s formula holds for any lattice polygon.  \Box

This post has a pretty weird title, but the problem is easy to state and uses a few interesting mathematical concepts.  It’s worth going through.  Let’s start with the basics.


Problem 1.  Let S_{\alpha}(n) = 1^{\alpha} + 2^{\alpha} + \dots + n^{\alpha}.  Show that S_{\alpha}(n) is a polynomial for each \alpha\in {\mathbb N} and that the degree of the polynomial is \alpha + 1.


Indeed, for example, we have  that 1 + 2 + \dots + n = \frac{n(n+1)}{2}, as we learned in Calculus, and this is a polynomial of degree 2.  Similarly, 1^{2} + 2^{2} + \dots + n^{2} = \frac{n(n+1)(2n+1)}{6}, which is a polynomial of degree 3.  In the same respect, 1^{3} + 2^{3} + \dots + n^{3} = \frac{(n(n+1))^{2}}{4}, which is a polynomial of degree 4.

The associated polynomials in this case are given by Faulhaber’s formula:


Theorem (Faulhaber).  For \alpha\in {\mathbb N} we have \displaystyle \sum_{k=1}^{n}k^{\alpha} = \frac{1}{1 + \alpha}\sum_{j=0}^{\alpha}(-1)^{j}\binom{\alpha + 1}{j}B_{j}n^{\alpha + 1 - j}.


This formula looks terrifying, but it is not hard to apply in practice.  You may be wondering, though, what the B_{j}‘s in this formula stand for.  These are the strange and wonderful Bernoulli numbers, of course!  I always enjoy seeing these creatures, because they unexpectedly pop up in the strangest problems.  There are a number of ways to define these numbers, one of which is to just write them out sequentially, starting with B_{0}:


1, -\frac{1}{2}, \frac{1}{6}, 0, -\frac{1}{30}, 0, \frac{1}{42}, 0, -\frac{1}{30}, 0, \dots


But in this case it is not so easy to guess the next value.  The clever reader will notice that all of the odd numbered Bernoulli numbers (except the first) are zero, but other than that there does not seem to be a clear pattern.  Fortunately, we can construct a function which generates the values as coefficients; we’ll call this function (surprise!) a generating function.


Definition.  We define the sequence \{B_{j}\}_{j=0}^{\infty} by

\displaystyle \frac{t}{e^{t} - 1} = \sum_{j=0}^{\infty}B_{j}\frac{t^j}{j!}.


Notice that this will, in fact, generate the B_{j} as coefficients times \frac{1}{j!}.  Neat.  In practice, you can use a program like Mathematica to compute B_{j} for pretty large values of j; but, of course, there are lists available.  We can now use Faulhaber’s formula above, which gives us (assuming we have proven that the formula holds!) that the sums of powers of natural numbers form polynomials of degree \alpha + 1.


But something else happens that’s pretty interesting.  Let’s look at some of the functions.


\begin{array}{|c|c|}\hline \alpha & \mbox{Polynomial}\\\hline &\\ 1 & \frac{1}{2}n^{2} + \frac{1}{2}n\\ &\\ 2 & \frac{1}{3}n^{3} + \frac{1}{2}n^{2} + \frac{1}{6}n\\ & \\ 3 & \frac{1}{4}n^{4} + \frac{1}{2}n^{3} + \frac{1}{4}n^{2} \\ & \\ 4 & \frac{1}{5}n^{5} + \frac{1}{2}n^{4} + \frac{1}{3}n^{3} - \frac{1}{30}n\\ & \\ \hline \end{array}


Look at the coefficients in each of these polynomials.  Anything strange about them?  Consider them for a bit.


Problem.  Look at the coefficients.  What do you find interesting about them?  Note that, in particular, for a fixed \alpha, the coefficients of the associated polynomial sum to 1.  Convince yourself that this is probably true (do some examples!) and then prove that it is true.  Do this before reading the statements below.


Anecdote.  I spent quite a while trying to write down the "general form" of a polynomial with elementary symmetric polynomials and roots to try to see if I could prove this fact using some complex analysis and a lot of terms canceling out.  This morning, I went into the office of the professor to ask him about what it means that these coefficients sum up to 1.  He then gave me a one-line (maybe a half-line) proof of why this is the case. 


Hint.  What value would we plug in to a polynomial to find the sum of the coefficients?  What does plugging in this value mean in terms of the sum?

Question:  Consider the subspace E of L^{2}([-1,1]) consisting of even functions (that is, functions with f(x) = f(-x)).  Find the orthogonal complement of E.


One Solution.  It’s easy to prove E is a subspace.  Then, there is a representation of any function in this space by adding odd and even functions together; more precisely, given f\in L^{2}([-1,1]) we have that \frac{f(x) + f(-x)}{2} is even and \frac{f(x) - f(-x)}{2} is odd and f(x) = \frac{f(x) + f(-x)}{2} + \frac{f(x) - f(-x)}{2}.  For uniqueness, note that if f(x) = f(-x) = -f(x), then f(x) = -f(x) for each x, giving us that f(x) = 0.  Hence, the orthogonal complement of E is the set of odd functions.  \diamond


Here’s another solution that "gets your hands dirty" by manipulating the integral.


Another Solution.  We want to find all g\in L^{2}([-1,1]) such that \langle f, g \rangle_{2} = 0 for every even function f\in L^{2}([0,1]).  This is equivalent to wanting to find all such g with \displaystyle \int_{-1}^{1}f(x)g(x)\, dx = 0.  Assume g is in the orthogonal complement.  That is,

\displaystyle 0 = \int_{-1}^{1}f(x)g(x)\, dx = \int_{-1}^{0}f(x)g(x)\, dx + \int_{0}^{1}f(x)g(x)\, dx

\displaystyle = -\int_{1}^{0}f(-x)g(-x)\, dx + \int_{0}^{1}f(x)g(x)\, dx

The last equality here re-parameterizes the first integral by letting x\mapsto -x, but note that our new dx gives us the negative sign.

\displaystyle = \int_{0}^{1}f(-x)g(-x)\, dx + \int_{0}^{1}f(x)g(x)\, dx

\displaystyle = \int_{0}^{1}f(x)g(-x)\, dx + \int_{0}^{1}f(x)g(x)\, dx

\displaystyle = \int_{0}^{1}f(x)g(-x) + f(x)g(x)\, dx = \int_{0}^{1}f(x)(g(x) + g(-x))\, dx.

We may choose f(x) = g(x) + g(-x) since this is an even function, and we note that this gives us

\displaystyle 0 = \int_{0}^{1}(g(x) + g(-x))^{2}\, dx.

Since (g(x) + g(-x))^{2}\geq 0, it must be the case that (g(x) + g(-x))^{2} = 0.  [Note: The fact that this is only true "almost everywhere" is implicit in the definition of L^{2}.]  Hence, g(x) + g(-x) = 0, giving us that -g(x) = g(-x)

We now have one direction: that if g is in the orthogonal complement, then it will be odd.  Now we need to show that if g is any odd function, it is in the orthogonal complement.  To this end, suppose g is an odd function.  Then by the above, we have

\displaystyle \langle f, g \rangle_{2} = \int_{-1}^{1}f(x)g(x)\, dx = \int_{0}^{1}f(x)(g(x) + g(-x))\, dx = 0

where the last equality comes from the fact that g is odd.  \diamond

[Note: It’s been a while!  I’ve now completed most of my pre-research stuff for my degree, so now I can relax a bit and write up some topics.  This post will be relatively short just to “get back in the swing of things.”]


In Group Theory, the big question used to be, “Given such-and-such is a group, how can we tell which group it is?”


The Sylow Theorems (proved by Ludwig Sylow, above) provide a really nice way to do this for finite groups using prime decomposition.  In most cases, the process is quite easy.  We’ll state the theorems here in a slightly shortened form, but you can read about them here.  Note that subgroup which is of order p^{\beta} for some \beta is unsurprisingly called a p-subgroup. A p-subgroup of maximal order in G is called a Sylow p-subgroup.


Theorem (Sylow).  Let G be a group such that |G| = p^{\alpha}m for p\not| m.  Then,

  1. There exists at least one subgroup of order p^{\alpha}.
  2. The Sylow p-subgroups are conjugate to one-another; that is, if P,Q are Sylow p-subgroups, then there is some g\in G such that gPg^{-1} = Q.  Moreover, for all g\in G, we have that gPg^{-1} is a Sylow p-subgroup.
  3. The number of Sylow p-subgroups of G, denoted by n_{p}, is of the form n_{p} \equiv 1\mbox{ mod } p.  In other words, n_{p} divides m.


This first part says that the group of Sylow p-subgroups of G is not empty if p divides the order of G.  Note that this is slightly abbreviated (the second part is actually more general, and the third part has a few extra parts) but this will give us enough to work with.


Problem: Given a group  |G| = pq for p,q prime and p < q, is G ever simple (does it have any nontrivial normal subgroups)?  Can we say explicitly what G is?


We use the third part of the Sylow theorems above.  We note that n_{q} | p and n_{q} \equiv 1\mbox{ mod } q, but p < q so this immediately implies that n_{q} = 1 (why?).  So we have one Sylow q-subgroup; let’s call it Q.  Once we have this, we can use the second part of the Sylow theorem: since for each g\in G we have gQg^{-1} is a Sylow q-subgroup, but we’ve shown that Q is the only one there is!  That means that gQg^{-1} = Q; this says Q is normal in G.  We have, then, that G isn’t simple.  Bummer.

On the other hand, we can actually say what this group is.  So let’s try that.  We know the Sylow Q-subgroup, but we don’t know anything about the Sylow P-subgroups.  We know that n_{p} \equiv 1\mbox{ mod }p and n_{p} | q, but that’s about it.  There are two possibilities: either n_{p} = 1 or n_{p} = q.

For the first case, by using the modular relation, if p does not divide q-1 then this forces n_{p} = 1; this gives us a unique normal Sylow p-subgroup P.  Note that since the orders of our normal subgroups multiply up to the order of the group, we have PQ \cong P\times Q \cong G; in other words, G \cong {\mathbb Z}_{p}\times {\mathbb Z}_{q}.

For the second case, n_{p} = q.  We will have a total of q subgroups of order p and none of these are normal.   This part is a bit more involved (for example, see this post on it), but the punch line is that it will be the cyclic group {\mathbb Z}_{pq}.


I’ll admit that the last part is a bit hand-wavy, but this should at least show you the relative power of the Sylow theorems.  They also come in handy when trying to show something either does or does not have a normal subgroup.  Recall that a simple group has no nontrivial normal subgroups.


Question.  Is there any simple group with |G| = 165?


I just picked this number randomly, but it works pretty well for this example.  We note that |G| = 165 = 3\cdot 5\cdot 11.  Let’s consider, for kicks, n_{11}.  We know n_{11} must divide 3\cdot 5 = 15 and it must be the case that n_{11} \equiv 1\mbox{ mod } 11; putting these two facts together, we get n_{11} = 1.  This immediately gives us a normal subgroup of order 11, which implies there are no simple groups of order 165.


Question.  Is there any simple group with |G| = 777?


Alas, alack, you may say that 777 is too big of a number to do, but you’d be dead wrong.  Of course, 777 = 3\cdot 7\cdot 37.  Use the same argument as above to show there are no simple groups of this order.


Question.  Is there any simple group with |G| = 28?


Note that 28 = 2^{2}\cdot 7 so we need to do a little work, but not much.  Just for fun, let’s look at n_{7}.  We must have that it is 1 modulo 7 and it must divide 2^{2} = 4.  Hm.  A bit of thinking will give you that n_{7} = 1, which gives us the same conclusion as above.


Of course, there are examples where this doesn’t work nicely.  Think about the group of order 56, for example.  In order to work with these kinds of groups, one must do a bit more digging.  We will look into more of this later.

[Note: It’s been too long, mathblog!  I’ve been busy at school, but I have no plans to discontinue writing in this thing.  For longer posts, you’ll have to wait until I have some free time!]


You’ve probably seen a graph before: they’re collections of vertices and edges pieced together in some nice way.  Here’s some nice examples from wikipedia:

File:McGee graph.svg

File:Moser spindle.svg



One theorem that I constantly run into (mainly because it’s relevant to homology) is the following:


Theorem.  Given some connected finite graph G, there exists a spanning tree of G.


Note that such a tree is not generally unique.  You might want to find one or two in the graphs above to prove this to yourself!  Nonetheless, even though this proof seems like it would be a bit intimidating (or, at least, by some sort of tedious construction), it’s actually quite nice.  Let’s go through it.


Proof. We’ll prove this by the number of cycles in G.  If G has no cycles then it is, itself, a tree; hence, G is its own spanning tree.  Suppose now that any graph with n cycles has a maximum spanning tree.  Given some graph with n+1 cycles, take any one of these cycles and delete any edge (but not deleting the vertices!).  This reduces the number of cycles by (at least) one, so there is a spanning tree by induction.  In fact, since we have not deleted any vertices, we may replace the edge we removed and this tree will still span the graph.  \Box


A couple of questions about this for the eager reader.  Where did we use connectedness?  Where did we use finite-ness?  What if we were given an infinite graph?  Is there a nice notion of cycles for this kind of graph?  Draw some pictures and think about it!

Baire Category Theorem.

January 5, 2012

During one of my previous qualifying exams, one of the questions looked, at first, to be a basic Fourier Analysis question.  The hint, though, was to use the Baire Category Theorem.  The problem and hint looked so unrelated that the phrase has become somewhat of a running joke for some of us in the department (“How are we going to prove this is a cofibration?”  “Oh, easy, just use the Baire Category theorem.”).

The more I read about it, the more I realized that the Baire Category theorem pops up much more than I realized and in a number of surprising and diverse topics.  This post is just to introduce you to the BCT and to show off some of its raw power.  One of the results below is one that I genuinely don’t believe — that is, I know it has to be true (it was proved!) but it just seems so outrageous that I can’t being myself to believe it!  But, we’ll get to that.


Leading up to the Theorem.

There are a few different formulations of the theorem, so I’ll pick the one that turns out to be most common in what I’ve read.  First, let’s quickly define some terms.


Definition.  A complete metric space is a metric space where each Cauchy sequence converges to a point in the space.


For example, {\mathbb R} is complete, but {\mathbb Q} is not complete since in the latter space we can take a sequence like: 1, 1.4, 1.41, 1.414, … which are all rational numbers that converge to \sqrt{2}, but \sqrt{2} isn’t in the rational numbers.


Definition.  A subset of a space is dense if the closure of the subset is the entire space.


For example, {\mathbb Q} is dense in {\mathbb R} since the closure of {\mathbb Q} is {\mathbb R}.  The irrationals are also dense in {\mathbb R}.  The integers, {\mathbb Z}, are not dense in {\mathbb R} because the closure of the integers in {\mathbb R} is just {\mathbb Z} (why?).


Q: What are the dense subsets of a non-empty discrete space?  Does this make sense “visually”?


Definition.  The interior of a subset Y is the largest open set contained in Y.  In metric spaces, it’s the union of all the balls contained in Y.


The next definition might be a little strange if you haven’t seen it before and, in fact, there are two nice (equivalent) ways of thinking about this concept.  I’ll list both.


Definition.  A set Y is nowhere dense in some space X if either (and hence both):

  1. The interior of the closure of Y in X has empty interior.
  2. \overline{Y^{C}} = X; that is, the closure of the compliment of Y is dense in X.


This last definition might seem a bit strange to you, but the “image” of this is the following: a nowhere dense set is literally a set which is not dense anywhere — but what does that mean?  Density isn’t usually considered a local property.  How do we usually check density, though?  We take the closure of the set and if it’s the whole space then we say the set is dense.  If we wanted to make this more “local”, what we could say is that we don’t get any neighborhoods of the whole space.  This is the same as saying the closure of our subset has no open neighborhoods inside of it or that it has empty interior.  This givers us the first definition.  The second is a direct consequence which is sometimes more useful to think about depending on your original subset.


Example.  We have {\mathbb Z}\subseteq {\mathbb R} with the standard topology.  Is {\mathbb Z} nowhere dense in {\mathbb R}?  Note that \bar{{\mathbb Z}} = {\mathbb Z} and this has empty interior.  By the first definition, {\mathbb Z} must be nowhere dense in {\mathbb R}.  If we wanted to use the second definition, we could say that the compliment of {\mathbb Z} in {\mathbb R} is just everything but the integers; but the closure of this is {\mathbb R} which means the compliment of {\mathbb Z} is dense in {\mathbb R} giving us the same conclusion.  This is not surprising, since the elements of {\mathbb Z} are “pretty far apart” in {\mathbb R}.


Example.  Our favorite limit-point example, Y = \{\frac{1}{k}\,|\, k\in {\mathbb N}\} is a subset of {\mathbb R}.  Is it nowhere dense?  Using the first definition, the closure is just Y\cup \{0\} which has empty interior (why?).  With the second definition, the compliment of Y has closure equal to {\mathbb R} and so it is dense.  In either case, this proves that Y is nowhere dense in the reals.


Example.  The ball (0,1)\subseteq {\mathbb R}.  The closure of this set is [0,1] which has interior equal to (0,1).  Therefore, this set is not nowhere dense.  That’s a bit sad.  But you can see that this set really is “somewhere dense” visually.


Example.  The Cantor set is closed (there are a number of cute ways to prove this) so to show it is nowhere dense in [0,1] it suffices to show that it has empty interior by the first definition.  Prove it!


You may have been surprised with the last example, or even the \frac{1}{k} example.  For the latter one, it seems like the points are really “getting close” to 0, so maybe some density should occur; it’s not the case, though!  For the Cantor set, many points in this set can be “very close” to one-another so it is a bit surprising that this set is not dense anywhere.  If one goes through the construction slowly, you’ll see why the points are “just far enough apart” to reasonably be considered (that is, intuitively considered) nowhere dense.


The Theorem.

Why would we ever care about nowhere dense sets?  Well, they’re kind of crappy in the following sense: even if we unioned a countable number of nowhere dense subsets together, we could never recover the original set.  An analogy from anthropology would go like this: we might be able to find tons of copies of some ancient book, but if each of them are missing a whole lot (nowhere dense) then even combining them all together we can’t recover the original manuscript.


Oh.  And why do jigsaw puzzle makers make each of the puzzle pieces look like that?

Because if they made them nowhere dense, we would never be able to finish the puzzle.  Buh-dum-tsh.


Theorem (Baire Category Theorem).  A non-empty complete metric space is not the countable union of nowhere dense sets.


This shouldn’t be too scary to you at this point.  Let’s do some immediate consequence examples.


Example.  Do you think we could union a countable number of things that look like {\mathbb Z} together and get something like the real numbers?  That is, maybe we could take

{\mathbb Z} = \{\dots, -1,0,1,2,\dots\}

{\mathbb Z} + \frac{1}{2} = \{\dots, -\frac{1}{2}, \frac{1}{2}, \frac{3}{2}, \frac{5}{2},\dots\}

{\mathbb Z} + \frac{1}{3} = \{\dots, -\frac{2}{3}, \frac{1}{3}, \frac{4}{3}, \frac{7}{3},\dots\}

and so on, and union them all together to make {\mathbb R}.

Nope.  The BCT says to not even try.  Each of these translates of {\mathbb Z} is nowhere dense, so a countable union of them would not get you {\mathbb R}.


Non-Example.  What if we did the same thing as above, but we wanted to union them all together to get {\mathbb Q}, the rationals.  What would Baire say about that?  Probably, “Don’t use my theorem with this, because {\mathbb Q} isn’t a complete metric space.”


Non-Example.  What if we tried the same thing with the {\mathbb Z} translates above as subsets of {\mathbb R}, but we union them all and then union that with the irrational numbers.  In this case, too, we can’t use Baire, because the irrational numbers are actually dense in the reals; the BCT doesn’t say anything about unions with dense subsets.


Example.  What if we have a complete metric space with no isolated points?  Can it be countable?

(Recall, an isolated point is one which has a neighborhood not containing any other point of the space.  For example, if we took \{0\}\cup [1,2] in the subspace topology of the reals, then \{0\} is an isolated point of this subspace since it, itself, is open and it contains no other point of the subspace.)

In this case, we can cleverly use the BCT to show that every complete metric space with no isolated points is uncountable.  Suppose it were countable.  Then every point of the space is nowhere dense (!  Why is this the case?  What is the closure of a point?  What is its interior?  Does this argument show why we cannot have isolated points?).  But since our space is countable, it is the countable union of its points, each of which is nowhere dense.  Baire would have a real problem with that.  This contradicts BCT, so such a space must be uncountable.


Why is it called the “Category” theorem?

This “category” isn’t the same category as in category theory.  The term comes from the original language that the theorem was stated in.  It’s a bit ugly, so I’ve yet to mention it, but I might as well spell it out for you now:

  • We call a set of First Category if it can be written as a countable union of nowhere dense sets.  This is also called a meagre set.  (Amusingly, the meagre is also the name of a fish.)  Sometimes this is Americanized (that is, bastardized) and pronounced as “meager set” which would mean something like, “thin set” or “set having little substance” which is, intuitively, what this set should “look” like.
  • We call a set of Second Category if it is not of first category.


We also have what’s called a Baire Space, which is just a space where every non-empty open set is of second category.  This means, intuitively, that an open set needs to have some “density” to it.  The original Baire Category theorem stated that every complete metric space is a Baire Space.  This is “almost” equivalent to what we have above; see below.


Other Formulations of the BCT.

There are three distinct formulations of the BCT that are generally given (the texts  I have in my office give at most two, but the third is a consequence of the first so this is perhaps why).  Right above this, we give one of them.


BCT (1).  Every complete metric space is a Baire space; that is, every open subset in a complete metric space is not the countable union of nowhere dense sets.

BCT (3).  Every complete metric space is not the union of nowhere dense sets.


I list these together because it’s clear that the third is a consequence of the first.  (These numbers are because the second BCT is actually called “the second BCT” in a few places.)


BCT (2).  Every locally compact Hausdorff space is a Baire space.  Recall that locally compact means that every point of our space has a compact neighborhood.


Note that it’s not the case that the first BCT implies the second, and it’s not the case that the second implies the first.  It would be nice if there were some meta-BCT that implied both of them, but when we write down exactly what we need for something to be a Baire space, we’re left with a couple of choices; if you’re interested in this, you should check out the proofs for both of these theorems and see where we’re using locally compact verses where we’re using complete.

A Surprising Result.

One of the coolest results I’ve seen using the BCT (and, indeed, one of the coolest results I’ve seen perhaps in general) is the following.


Maybe, first, it’s nice to ask the following question (that you can even ask to your friends!): what’s your idea of a “general” continuous function?  That is, just think of the most generic continuous function you can.  Visualize it.  Maybe draw it.  What do you have?  Probably some sort of polynomial.  (When I do this, I actually draw fifth degree polynomials most of the time.)  Is this really so generic?


First, let’s just define exactly what we mean by “generic.”


Definition.  If A is a subset of some space X, the compliment of A is nowhere dense, and all points in A share some property, then we call this property generic of X.


This should be reasonable: something genertic should be representative of the whole.  That is, if something is not generic, then it should not happen very often; in our language, the set of non-generic things should be nowhere dense.


Theorem (Banach, Mazurkiewicz, 1931).  The generic (in the sense of the above definition) continuous function on a compact interval (that is, f\in C^{0}([a,b],{\mathbb R})) is nowhere differentiable, and not monotone on any subinterval of [a,b].


Just think about that for a second.  Let it sink in.  Nowhere differentiable is strange enough; we can (and will) use the BCT to talk about a nowhere differentiable function.  But, what’s more, it’s not monotone on any subinterval, no matter how small.  And, yes, this function is generic.  What.


As a note, somewhat strangely, because of the above definition of “generic” it is entirely possible to have two different kinds of “generic” things representing some space.  The definition doesn’t say anything about the generic property being unique and, in general, it isn’t.



BCT in Functional Analysis.

to do.

I’m going through a few books so that I can start doing lots and lots of problems to prepare for my quals.  I’ll be posting some of the “cuter” problems.


Here’s one that, on the surface, looks strange.  But ultimately, the solution is straightforward.


Problem.  Find an uncountable subset A\subseteq [0,1] such that A - A = \{a - b\,\mid\,\,a,b\in A\} has empty interior.


Read the rest of this entry »

I just saw this problem in a book of algebra problems, and I thought it was nice given one of the previous posts I had up here.


Problem.  Show that 11, 111, 1111, 11111, … are not squares.


You ought to think about this for a bit.  I started by supposing they were squares and attempting to work it out like that; unfortunately, there are some strange things that happen when we get bigger numbers.  But.  You should see a nice pattern with the last two digits.  Click below for the solution, but only after you’ve tried it!

Read the rest of this entry »