Recently, I’ve been learning to program in a new language and have been doing Project Euler problems for practice — of course, as the name suggests, most of these problems deal explicitly with problems which must be solved (efficiently) with mathematical techniques.  Two of the most common algorithms I’ve used are: Prime Testing, GCD Finder.  I’ll post about the former later, but the latter is an interesting problem in its own right:


Initial Problem.  Given two natural (positive whole) numbers called m, n, can we find some other natural number that divides both of them?


This problem is a first step.  It’s nice to be able to write numbers as a multiple of some other number; for example, if we have 16 and 18, we may write them as 8\times 2 and 9\times 2, thus giving us some insight as to the relationship between these two numbers. In that case it may be easy to see, but perhaps if you’re given the numbers 46629 and 47100, you may not realize right away that these numbers are 99\times 471 and 100\times 471 respectively.  This kind of factorization will reveal "hidden" relationships between numbers.  

So, given two numbers, how do we find if something divides both of them — in other words, how do we find the common divisors of two numbers?  If we think back to when we first began working with numbers (in elementary school, perhaps) the first thing to do would be to note that 1 divides every number.  But that doesn’t help us all that much, as it turns out, so we go to the next number: if both numbers are even, then they have 2 as a common factor.  Then we "factor" both numbers by writing them as 2\times\mbox{ something} and then attempt to keep dividing things out of the something.  We then move onto 3, skip 4 (since this would just be divisible by 2 twice), go onto 5, then 7, then…and continue for the primes.  This gives a prime factorization, but we have to note that if, say, 2 and 5 divide some number, then so does 10.  These latter divisors are the composite factors.

This seems excessive, but it is sometimes the only way one can do it. 

Anecdote!: On my algebra qualifying exam, there was a question regarding a group of order 289 which required us to see if 289 was prime or not; if not, we were to factor it.  We were not allowed calculators, so what could we do?  Try everything.  Note that we only need to try up to the square root of the number (which we could estimate in other ways), but it’s still a number of cases.  If you check, none of the following numbers divide into 289: 2, 3, 5, 7, 11, 13.  At this point, I was about to give up and call it a prime, but, for whatever reason, I decided to try 17.  Of course, as the clever reader will have pointed out, 289 = 17\times 17.  It is not prime.  There was, luckily, only one student who thought it was prime, but it points out how the algorithm above is not entirely trivial if one does not have access to a computer or calculator. 


Once we have a common divisor, or a set of common divisors, a natural thing to want to do is to find the biggest (we already have the smallest, 1) since in this way we can write our numbers with the largest common factor multiplied by some other number.  It will, in effect, make things prettier.


Real Problem.  Find the greatest divisor which is common to two natural numbers, m, n.


If you were just learning about this kind of thing, you may spout out the following solution: find all of the common divisors, then pick the greatest.  While this is not especially efficient, it is a solution.  Unfortunately, even for small numbers, this gets out of hand quickly.  For example, 60 and  420 have the following common divisors: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60.  This takes a while to compute by hand. 

Even if we were to find prime factors, this would be 60 = 2^2 \times 3\times 5 and 420 = 2^2 \times 3 \times 5\times 7, which gives us that they share a number of prime factors.  A bit of thinking gives us that we take all of the prime factors they "share" and multiply them together to get the greatest common divisor.  This is another potential solution which is much faster than simply listing out all of the common divisors.  Unfortunately, this falls prey to the same kind of trap that other prime-related problems do: it is, at times, especially difficult to factor large composite numbers.  For example, the "reasonably small" number 49740376105597 has a prime factorization of 2741 \times 37813 \times 479909; this is not at all efficient to factor if one does not have a computer or a specialized calculator with a factoring algorithm on it.  As a mean joke, you may ask your friend to factor something like 1689259081189, which is actually the product of the 100,000th and 100,001st prime — that is, they would need to test 99,999 primes before getting to the one which divides the number.  If they divided by one prime per second (which is quite fast!) this would take them 1 day, 3 hours, and 46 minutes.  Not especially effective, but it will eventually get the job done.


Real Problem, With Efficiency: Find the greatest divisor which is common to two natural numbers, m,n, but do so in an efficient manner (we’ve all got deadlines!).


We need to sit down and think about this now.  We need an entirely new idea.  We note, at least, that for the two numbers m,n that one of them must be larger than the other (or else the problem is trivial).  One thing to try would be to see if the smaller one goes into the larger one (for example, above we had 60 going into 420, which gave us the easy solution that 60 must be the greatest common divisor).  If not, maybe we can see how much is left over.  That is, if m is the larger number,

m = a_{1}n + r_{1}

where here a_{1} is the number of times n goes into m without exceeding it, and r_{1} is the "remainder"; if it’s equal to 0, then n evenly divides into m, and otherwise it is less than n (or else we could divide an additional n into m). 

Using this, if r_{1}\neq 0, we may write m - a_{1}n = r_{1}; this means that, in particular, r_{1} divides m and a_{1}n, so it is a factor of m and of a_{1}n.  But it may not actually be a factor of n; so let’s see how many times it goes into n.  Using the same process…

n = a_{2}r_{1} + r_{2}

and by rearranging, we have that n - a_{2}r_{1} is divisible by r_{2}.  So, n is divisible by r_{2}, but we aren’t sure if r_{1} is divisible by r_{2}…if it were, we would be able to say that r_{2} was a common divisor of m and n (why?).  That’s something at least. 

The cool thing about our algorithm here is that, because a_{1}n + r_{1} = m we have that either r_{1} = 0 and we’re done with the algorithm, or r_{1} > 0 and we may form a new equation n = a_{2}r_{1} + r_{2}; this equation has, on the left-hand side, the number n which is less than the previous equation’s left-hand side, which was m.  Continuing this process, we will have r_{1}, r_{2}, \dots on the left-hand side, each of which is less than the one which came before it.  Because r_{i} \geq 0 for any of the remainders, eventually it will become 0 (why?) and this algorithm will terminate.  That is, we will have found some r_{i} which is a common divisor for both n, m; specifically, it will be the r_{i}\neq 0 such that r_{i+1} = 0 (or, it may simply be n if n divides m).

This algorithm, called the Euclidean Algorithm, actually does more "automatically": it not only finds a common divisor, but actually finds the greatest common divisor of m,n, which, from now on, we will denote \gcd(m,n).  The "proof" of this is simply noting that \gcd(m,n) = \gcd(n,r_{1}) = \gcd(r_{1},r_{2}) = \cdots = \gcd(r_{i-1},r_{i}) = r_{i} (we noted this above without making reference to the gcd, but the reader should attempt to go through all the same steps using the idea of the gcd). 

So.  If you have two natural numbers, n,m, you divide them, find the remainder, write the equation, then continue as above until you get a 0 remainder.  Then you pick the remainder directly before you got 0 as your gcd (or, you pick the smaller number if one number divides the other).  Pretty simple algorithm, but is it efficient?

Without going into formal "efficiency" definitions, "yes", it is quite efficient.  To prove it, let’s take an "average" example using the "large" numbers 1337944608 and 4216212.  We note that (by pen and paper, or by using a standard calculator) that

1337944608 = 317(4216212) + 1405404.

Next, we note that

4216212 = 3(1405404) + 0

which instantly gives us the solution \gcd(4216212, 1337944608) = 1405404.  That’s pretty awesome.  Note that this was an especially quick trial, but even the "worst" ones are relatively quick. 


Unexpected Corollary!:  For n,m natural numbers, if \gcd(n,m) = k then there exists integers a,b such that an + bm = k.


This is more useful than you might think at first glance, and we’ll get into why in a later post, but what’s nice about this corollary is that it comes "for free" from the Euclidean algorithm.  Note that, since k divides n, m, it suffices to prove this corollary for an + bm = 1 where n, m have \gcd(n,m) = 1.  The proof uses induction on the number of steps of the Euclidean algorithm for those numbers, but for those of you who are more experienced and know modular arithmetic, you may enjoy the following simple proof:


"Clever" Proof of the Corollary: Let m > n (for equality, the proof is easy).  We will only care about remainders in this proof, so we will look at some numbers modulo m.  Consider


r_{1} = n\mod m

r_{2} = 2n\mod m


r_{m-1} = (m-1)n\mod m


Note there are exactly m-1 remainders here and that the remainder 0 never occurs (since m,n are relatively prime).  Suppose that r_{i} \neq 1 for each of the i; that is, the remainder 1 does not ever show up in this list.  By the pigeon-hole principle (as there are m - 1 remainders but only m -2 possible values for the remainders) we must have that r_{i} = r_{j} for some i\neq j.  That is, we have

in \mod m = jn \mod m

which implies

(i-j)n\mod m = 0

but this is impossible, since it implies that either m = 0 or n is some integer multiple of m, but m > 0 and we have assumes m,n are relatively prime.  Hence, the remainder 1 must occur.  That is, r_{c} = 1 for some c and

cn \mod m = 1.

But what does this mean?  It means that there is some integer a such that am - cn = 1.  To make this prettier, let b = -c and we find that there exists a,b integers such that am + bn = 1, as required.  \Box


Pretty slick, no?


Let’s talk about Burnside’s theorem.  First, let me note that there are two results commonly called "Burnside’s Theorem."  The first one that I learned (which we won’t be discussing in this post) was:


Theorem (Burnside).  If G is a finite group of order p^{a}q^{b} where a,b are non-negative integers and where p,q are primes, then G is solvable.


The second one is also a group theoretical result, but a bit more combinatorial-feeling.  In some books (and, apparently, Wikipedia) this second result is called Burnside’s Lemma.  As noted in the Wikipedia article, this theorem was not even due to Burnside, who quoted the result from Frobenius, who probably got it from Cauchy. 

Let’s get some definitions down.  As usual, we’ll denote the order of the group by |G|, and our groups will all be finite in this post.  If we have a group G which acts on a set X, then given some fixed g\in G we define the set Fix_{X}(g) = \{x\in X\,|\, g\cdot x = x\}; this is, of course, the fixed points in X when acted on by the element g; for the remainder of the post, we will simply write Fix(g) with the set X implied.  Remember, when a group acts on X, the "product" will sit inside of X, and we write the action as g\cdot x for an element g\in G acting on an element x\in X.  The orbit of a point x\in X when acted on by G is given by \{g\cdot x\,|\, g\in G\}; we’ll denote this Orb(x), though this is not standard notation.  The orbit, essentially, is all of the possible values y\in X that you can get to by acting on x with elements in your group.

One thing to note, also, is that orbits are pairwise disjoint.  You should prove this to yourself if you haven’t already, but the idea is like this: if A,B are orbits of elements in X then suppose z\in A\cap B; then there is some x\in A, y\in B, g,h\in G such that g\cdot x = h\cdot y = z, but this implies (h^{-1}g)\cdot x = 1\cdot y = y which implies the orbits are identical (why?).  Hence, each element of X is in exactly one orbit


We need one more result before we can sink our teeth into Burnside.  Remember the fixed point set above?  This was all of the elements x such that g\cdot x = x for some g.  There’s a similar notion called a Stabilizer, denoted G_{x} = \{g\in G\,|\, g\cdot x = x\}; this is saying that we first fix x\in X, and then look at all the elements g\in G which stabilize it.  These definitions are pretty similar feeling (almost like family!) and, in fact, there is a nice relation between the two:


Notation.  Let X/G denote the set of orbits of X when acted on by G; when X is a group and G is a subgroup this is the same as a quotient.

Theorem (Orbit-Stabilizer Theorem).  There is a bijection between G/G_{x} and Orb(x)


That is, if we act on G by elements of g which fix x, then we will have the same number of elements as the orbit of x.  This might seem a little confusing at first, but if you work through it, it’s not so weird. 


Sketch of the Proof.  (Skip this if you’re not comfortable with all this notation above; just go down to the next theorem.)  Here, we want to show a bijection.  Notice that $G/G_{x}$ is the set of cosets hG_{x} for h\in G.  We claim that the mapping \phi which sends \phi: hG_{x} \mapsto h\cdot x is well-defined, injective and surjective (but not a homomorphism).  First, well defined: if hG_{x} = kG_{x} then $latex $hk^{-1}\in G_{x}$ which means that (hk^{-1})\cdot x = x.  This implies, after some manipulation, that h\cdot x = k\cdot x, which means these elements are identical in Orb(x).  Second, surjectivity is clear.  Last, if h\cdot x = g\cdot x in the orbit, then (g^{-1}h)\cdot x = x which implies g^{-1}h\in G_{x} which gives gG_{x} = hG_{x}; hence this map is injective.  This gives us that our map \phi is bijective.  \diamond


One immediate corollary is that |Orb(x)| = \dfrac{|G|}{|G_{x}|}; that is, the number of elements in the orbit of x is the same as the number of elements in G divided by the number of elements in g which fix x.  Think about this for a minute. 



Road to the Proof.

Okay.  Now, let’s think about something for a second.  What is the sum

\displaystyle \sum_{g\in G}|Fix(g)|

telling us?  This is the number of elements in x which are fixed by some g\in G; but there might be some overlap, since if g\cdot x = x and h\cdot x = x, then x will be counted twice: one as an element of Fix(g) and once as an element of Fix(h).  But how much overlap is there?  This is an innocent seeming question, and you might think something like, "Well, depends on how much stuff stabilizes each x.", and this is pretty close to the point. 

First, note that

\displaystyle \sum_{g\in G}|Fix(g)| = |\{(g,x)\in G\times X\,|\,g\cdot x = x\}|

which is just the long way to write out this sum; but, the nice part about that is, we can now think about this as all of the elements of G which are stabilized by some x\in X (why?).  Then,

\displaystyle \sum_{g\in G}|Fix(g)| = \sum_{x\in X}|G_{x}|.

If you don’t see this, you should prove to yourself why they’re the same sum (why is each element counted in the left-hand side also counted in the right-hand side?).  Now, by the Orbit-Stabilizer theorem above, this right-hand sum becomes pretty nice.  Specifically,

\displaystyle \sum_{x\in X}|G_{x}| = \sum_{x\in X}\frac{|G|}{|Orb(x)|} = |G|\sum_{x\in X}\frac{1}{|Orb(x)|}

where we noted in the last equality that |G| is a constant, so we may pull it out of the sum. 

Recalling that X/G denotes the number of orbits, we have that if we take a single orbit (call it A) we will be adding \frac{1}{|A|} up exactly |A| times (since the sum is taken over each x\in X so, in particular, over each $x\in A$); hence, we will add \frac{1}{|A|}\cdot |A| = 1 for each orbit we have in X/G.  That is;

\displaystyle = |G|\sum_{A\in X/G}1 = |G||X/G|.

Putting this all together, we have

\displaystyle \sum_{g\in G}|Fix(g)| = |G||X/G|.


We clean it up a bit, and state the following:


Theorem (Burnside’s).  For a finite group G and a set X, with notation as above, we have \displaystyle |X/G| = \frac{1}{|G|}\sum_{g\in G}|Fix(g)|


That is, the number of orbits is equal to the sum, over g\in G, of the elements of x fixed under g, averaged by the number of elements in G.  Kind of neat.


Next time, we’ll talk about applications!

Pick’s Theorem.

November 22, 2012


In order to teach kids about the concept of area, math teachers sometimes showed pictures which looked like



and we would be asked to "guess the area."  If this was being taught to younger students, the way to do it would be to draw little boxes inside and then try to see how many "half-boxes" were left, how many "quarter-boxes" were left, and so forth.  If this was a more sophisticated lesson, our teacher would instruct us to cut up the picture into rectangles and triangles — the one above, for example, has a right triangle on the left-hand side and a 4×2 rectangle on the right-hand side.  From this, we could easily use the formula for area for each, and we could deduce the total area from these. 

So far nothing is too difficult to solve.  But, if we adjust the picture slightly (still using only straight lines to construct our figure) we may get something like




Here, it is difficult to tell what to do.  We could try to find "easier" triangles to break this into, but none are immediately obvious.  It’s not easy to tell what the angles are (except by using some algebraic methods, finding slopes, etc.).  The best one could do would be to "guess" what the lengths of the sides were (or the lengths of the base and height of the triangle). 


Motivating Pick’s Theorem.

Before we begin, let me define some of the things we’ll be working with.   A polygon is a familiar concept, but we ought to say some things about it:

A lattice polygon is a figure on the two-dimension lattice (as above) such that all of its edges begin and end on a lattice point, no edges overlap except at a lattice point, and its boundary is a closed path (informally, this means that if we started at a lattice point and "traveled" along an edge and kept going in the same direction, we’d eventually come back to where we started).  Moreover, the polygon has to have a distinct interior and exterior (as in, it cannot look like a donut with a hole in the middle).  [Topologists, note: this means that the boundary of our figure is homeomorphic to a circle and the interior is simply connected.]


Easy Polygons.

The "easiest" polygons to work with in terms of area are rectangles: once we find the side lengths, we’re done.  Let’s look at the rectangle below, ignoring the stuff in the lower-left for now.




We can easily find the area here: it is a 7×3 rectangle, giving us an area of 21.  Notice something kind of neat, though; if we look at each interior point of the rectangle, we can associate a unit square with it (In the picture above, I’ve identified the point "a" with the first unit square "a", and the lattice point "b" with the square "b").  Using just the interior points, we can fill up most of this rectangle with unit squares, as below:




We see that we get "almost" all of the area accounted for if we only identify interior points like this; we miss the "¬" shape on the top.  But besides this weird shape, we learned that if we have a rectangle with n interior points, we will be able to fill up n units squared of area in our rectangle. 

If we do the same process (which I mark with a different color) for the boundary points on the top of the rectangle by identifying them with a unit square to their lower-left, we notice that we have to "skip" one on the far-left.




[For clarification, the second lattice point on the top of the rectangle corresponds to the first box, the third point corresponds with the second box, and so forth.  We are forced to skip one lattice point (the first one) on the top.]

Notice that if we have \ell interior points and k points on the top boundary of the rectangle, then there must be k-2 points in each row of interior points; hence, there must be \dfrac{\ell}{k-2} points in each column of interior points. 

Notice, last, that in our picture we need only fill in the remaining few parts on the right side.  We notice that the number of squares needed to fill in this side is exactly the same number of points in each column of interior points  (check this out on a few different rectangles if you don’t believe it). 


Whew.  Okay.  So.  We have \ell interior points, each corresponding to a unit square.  All but one of the top boundary points corresponds to a square; this is k - 1 unit squares.  Last, we have the number of points in each columns of the interior points corresponding to the remaining unit squares; this is \dfrac{\ell}{k-2} unit squares. 

At this point we want to write this nicely, so we’ll denote the TOTAL number of boundary points b, and note that b = 2k + 2\dfrac{\ell}{k-2}.

The total area we found was \ell + k-1 + \dfrac{\ell}{k-2}.  If we rearrange this a bit, we notice that 

\mbox{AREA} = \ell + \left(k + \dfrac{\ell}{k-2}\right) - 1 = \ell + \frac{1}{2}b - 1.

This is exactly the statement of Pick’s theorem, and it is true more generally, as we’ll see.



We’ll briefly cover one more example.  Triangles are a natural figure to think about, so let’s see if Pick’s formula works here.  First, right triangles are easy to work with, so let’s look at this one (ignore the blue part for now):




Notice that this triangle is exactly half of a rectangle (this is completed by the blue part), and it just so happens to have no lattice points on the diagonal.  This last part is important so look again: none of the dots touch the diagonal of the red triangle (here, we are excluding the vertices of the triangle, which we don’t count as being on the diagonal).  Some come close, but none lie on it.  Of course, this is not true in general, but for now let’s just look at this triangle. 

If we use the formula above for the rectangle, we get that the area is \ell + \dfrac{b}{2} - 1 for the rectangle (in this specific case, \ell = 30 and b = 26), and half of this will be \dfrac{\ell}{2} + \dfrac{b}{4} - \dfrac{1}{2}

On the other hand, if we look at the interior points of the triangle, if none of them lie on the diagonal (like above) then we have exactly half of what the rectangle had, so our triangle has \dfrac{\ell}{2} interior points; the number of boundary points will be half of the number of boundary points of the rectangle, plus 1.  This can be seen as follows: if we consider all the points on the bottom boundary and all the points on the left boundary except for the top-most point, then this is exactly half of the boundary points of the rectangle.  Hence, the number of boundary points we have for our triangle is \dfrac{b}{2} + 1

Plugging this information into Pick’s formula (which, at this point, we only know is valid for the rectangle!) we obtain: \frac{\ell}{2} + \dfrac{b}{4} - \frac{1}{2}.  This is exactly the area we calculated before, giving us a verification that Pick’s formula works for right triangles with no lattice points on the diagonal. 


How do we get around the condition that no lattice points should be on the diagonal?  There is a relatively easy way to break up right triangles into other right triangles, none of which will have points on their diagonals.  I’ll demonstrate with this picture below:


The idea is to just take the big triangle, draw some vertical and horizontal lines from the lattice points which lie on the diagonal, until you get smaller triangles (which will have no lattice points on the diagonal) and a bunch of rectangles.  In this case, I first got two triangles (a small one on top, and a small one on the bottom right) and one little 4×3 rectangle in the lower-left.  You then split the rectangle in half, which gives you some more triangles; if these triangles had lattice points on the diagonal, I would have repeated the process, getting even smaller triangles and rectangles.  Because everything is nice and finite here, we don’t get infinite numbers of rectangles and triangles and such: this process will eventually stop.  We apply the above to each and note that this verifies Pick’s formula for any right triangle.


But even right triangles are a bit restrictive.  It would be nice if it were true for any triangle.  Indeed, there is a similar process which decomposes triangles into right triangles.  In fact, there are a number of such processes: for example, prove to yourself that every triangle has at least one altitude which is entirely contained in the triangle, and note that this splits the triangle into two right triangles.  However you show it, this verifies Pick’s formula for any triangle.



The Theorem.

Theorem (Pick).  Given a lattice polygon with \ell interior points and b boundary points, the total area enclosed by the figure is given by \ell + \dfrac{b}{2} - 1


The proof of this is done by induction.  This is a more unusual type of induction since it requires us to induct on the number of triangles a figure is made up of.  The difficult part has already been completed: we have shown that the formula holds for any triangle.  We need a fact which I will not prove here:


Fact: Every polygon can be decomposed into a collection of triangles which only intersect at their boundaries.  Moreover, lattice polygons can be decomposed into a collection of lattice triangles (triangles whose vertices are lattice points) which only intersect at their boundaries.


This process is called polygon triangulation.  In layman’s terms, it means that you can take a polygon and cut it up into a bunch of triangles.  Try it yourself for a few polygons!


Given all of this, let’s jump into the proof.


Proof.  By induction.  We have already proved the formula holds for triangles, so suppose the formula holds for all polygons which are able to be decomposed into k or fewer triangles. Take such a polygon P which is able to be decomposed into exactly k triangles and "attach" a triangle T to the boundary of P such that the resulting figure is still a lattice polygon; call this new polygon Q.





For the triangle, denote its boundary points by b_{T} and its interior points by \ell_{T}; similarly, for P denote its boundary points by b_{P} and its interior points by \ell_{P}.  Denote the common points that T and P share by c.

For interior points, note that we have added the interior points of T and P together, but we also obtain those points which they share on their boundary, except for the two points on the vertex of the triangle which are shared; that is,  \ell_{Q} = \ell_{T} + \ell_{P} + (c-2)


For boundary points, we have to subtract c-2 points from P‘s boundary and  c - 2 points from T‘s boundary (for the same reason as in the previous paragraph).  If we add together the boundary points of T minus the common points and the boundary points of P minus the common points, we will be counting those two points on the vertex of the triangle which are shared two times (why?), so we need to subtract 2 so that we only count these points once.  Hence, b_{Q} = b_{P} + b_{T} - 2(c - 2) - 2 = b_{P} + b_{T} - 2c + 2


At this point, let A_{P} be the area of P and A_{T} be the area of T; we have that:


\displaystyle A_{P} + A_{T} = \left(\ell_{P} + \frac{b_{P}}{2} - 1\right) + \left(\ell_{T} + \frac{b_{T}}{2} - 1\right)

\displaystyle = (\ell_{P} + \ell_{T}) + \frac{b_{P} + b_{T}}{2} - 2


We note now that \ell_{P} + \ell_{T} = \ell_{Q} - (c - 2) and b_{P} + b_{T} = b_{Q} + 2c - 2 from above, which gives us

\displaystyle = \ell_{Q} - (c - 2) + \frac{b_{Q}}{2} + (c - 1) - 2 = \ell_{Q} + \frac{b_{Q}}{2} - 1.

This verifies Pick’s formula for our lattice polygon Q, and since any lattice polygon can be constructed this way (from finitely many triangles) this shows that Pick’s formula holds for any lattice polygon.  \Box

This post has a pretty weird title, but the problem is easy to state and uses a few interesting mathematical concepts.  It’s worth going through.  Let’s start with the basics.


Problem 1.  Let S_{\alpha}(n) = 1^{\alpha} + 2^{\alpha} + \dots + n^{\alpha}.  Show that S_{\alpha}(n) is a polynomial for each \alpha\in {\mathbb N} and that the degree of the polynomial is \alpha + 1.


Indeed, for example, we have  that 1 + 2 + \dots + n = \frac{n(n+1)}{2}, as we learned in Calculus, and this is a polynomial of degree 2.  Similarly, 1^{2} + 2^{2} + \dots + n^{2} = \frac{n(n+1)(2n+1)}{6}, which is a polynomial of degree 3.  In the same respect, 1^{3} + 2^{3} + \dots + n^{3} = \frac{(n(n+1))^{2}}{4}, which is a polynomial of degree 4.

The associated polynomials in this case are given by Faulhaber’s formula:


Theorem (Faulhaber).  For \alpha\in {\mathbb N} we have \displaystyle \sum_{k=1}^{n}k^{\alpha} = \frac{1}{1 + \alpha}\sum_{j=0}^{\alpha}(-1)^{j}\binom{\alpha + 1}{j}B_{j}n^{\alpha + 1 - j}.


This formula looks terrifying, but it is not hard to apply in practice.  You may be wondering, though, what the B_{j}‘s in this formula stand for.  These are the strange and wonderful Bernoulli numbers, of course!  I always enjoy seeing these creatures, because they unexpectedly pop up in the strangest problems.  There are a number of ways to define these numbers, one of which is to just write them out sequentially, starting with B_{0}:


1, -\frac{1}{2}, \frac{1}{6}, 0, -\frac{1}{30}, 0, \frac{1}{42}, 0, -\frac{1}{30}, 0, \dots


But in this case it is not so easy to guess the next value.  The clever reader will notice that all of the odd numbered Bernoulli numbers (except the first) are zero, but other than that there does not seem to be a clear pattern.  Fortunately, we can construct a function which generates the values as coefficients; we’ll call this function (surprise!) a generating function.


Definition.  We define the sequence \{B_{j}\}_{j=0}^{\infty} by

\displaystyle \frac{t}{e^{t} - 1} = \sum_{j=0}^{\infty}B_{j}\frac{t^j}{j!}.


Notice that this will, in fact, generate the B_{j} as coefficients times \frac{1}{j!}.  Neat.  In practice, you can use a program like Mathematica to compute B_{j} for pretty large values of j; but, of course, there are lists available.  We can now use Faulhaber’s formula above, which gives us (assuming we have proven that the formula holds!) that the sums of powers of natural numbers form polynomials of degree \alpha + 1.


But something else happens that’s pretty interesting.  Let’s look at some of the functions.


\begin{array}{|c|c|}\hline \alpha & \mbox{Polynomial}\\\hline &\\ 1 & \frac{1}{2}n^{2} + \frac{1}{2}n\\ &\\ 2 & \frac{1}{3}n^{3} + \frac{1}{2}n^{2} + \frac{1}{6}n\\ & \\ 3 & \frac{1}{4}n^{4} + \frac{1}{2}n^{3} + \frac{1}{4}n^{2} \\ & \\ 4 & \frac{1}{5}n^{5} + \frac{1}{2}n^{4} + \frac{1}{3}n^{3} - \frac{1}{30}n\\ & \\ \hline \end{array}


Look at the coefficients in each of these polynomials.  Anything strange about them?  Consider them for a bit.


Problem.  Look at the coefficients.  What do you find interesting about them?  Note that, in particular, for a fixed \alpha, the coefficients of the associated polynomial sum to 1.  Convince yourself that this is probably true (do some examples!) and then prove that it is true.  Do this before reading the statements below.


Anecdote.  I spent quite a while trying to write down the "general form" of a polynomial with elementary symmetric polynomials and roots to try to see if I could prove this fact using some complex analysis and a lot of terms canceling out.  This morning, I went into the office of the professor to ask him about what it means that these coefficients sum up to 1.  He then gave me a one-line (maybe a half-line) proof of why this is the case. 


Hint.  What value would we plug in to a polynomial to find the sum of the coefficients?  What does plugging in this value mean in terms of the sum?

Seemingly unrelated is Gauss’ Mean Value Theorem, which is significantly cooler (in my opinion) than the standard mean value theorem of the reals.  We will define it formally below, but it says the follwing: if f is analytic (equivalent to complex differentiable) on some disk D and a is the center point of this disk, then the average of the values about the boundary of D is equal to f(a).  That is, to find the value of f(a), it suffices to integrate around a circle centered at a and divide by 2\pi (the amount of radians we pass through while integrating).  This is really neat to think about since this tells us not only that, given f there exists some point whose value is equal to the average of the sum of the values of f lying on a circle, but, moreover, that this point is actually the center of the circle.  This is intense stuff.


Theorem (Gauss’ Mean Value Theorem).  Let f(z) be analytic on some closed disk D which has center a and radius r.  Let C denote the boundary of the disk (that is, C is the circle bounding D).  Then we have that \displaystyle f(a) = \frac{1}{2\pi}\int_{0}^{2\pi}f(a+re^{i\theta})\,d\theta.


The proof of this theorem is pretty straight forward and uses the Cauchy integral formula and some easy substitution.


Proof.  Note that we have \displaystyle f(a) = \frac{1}{2\pi i}\oint_{C}\frac{f(z)}{z-a}\,dz.  The equation of a circle with radius r and center a is given by z = a + re^{i\theta} where \theta runs from 0 to 2\pi (if you don’t believe me, plot some points!).  Substituting this value into the integral and noting that dz = ire^{i\theta} we have that

\displaystyle f(a) = \frac{1}{2\pi}\int_{0}^{2\pi}\frac{f(a + re^{i\theta})ire^{i\theta}}{re^{i\theta}}\,d\theta = \frac{1}{2\pi}\int_{0}^{2\pi}f(a+re^{i\theta})\,d\theta

as required.  \diamond


Why bring up this neat little theorem?  Well, by itself it doesn’t seem to be all that useful — when would we be able to calculate and sum up a whole ton of values of an analytic function surrounding a point, but not be able to find the point itself?  But this little theorem packs some punch as a way of bounding certain values.  In particular, it gives a neat proof of the Maximum Modulus Theorem.  You might have guessed this from the title of this post.


First, let’s note something quickly.


Lemma.  Given the assumptions in Gauss’ MVT, we have \displaystyle |f(a)|\leq \frac{1}{2\pi}\int_{0}^{2\pi}|f(a+re^{i\theta})|\,d\theta


Be careful here in thinking that this should be an equality; we are now looking at the modulus of our value, and the modulus of each point on the circle.  But this lemma comes almost for free:


Proof.  We have |f(a)| =\left|\frac{1}{2\pi}\int_{0}^{2\pi}f(a+re^{i\theta})\,d\theta\right| by using Gauss’ MVT and simply taking the norm of both sides.  Note that

\displaystyle\left|\frac{1}{2\pi}\int_{0}^{2\pi}f(a+re^{i\theta})\,d\theta\right| = \frac{1}{2\pi}\left|\int_{0}^{2\pi}f(a+re^{i\theta})\,d\theta\right|

\displaystyle \leq \frac{1}{2\pi}\int_{0}^{2\pi}|f(a+re^{i\theta})|\,d\theta

whence the inequality above.  \diamond


This lemma tells us that the value of the center of any circle is bounded by the sum of the modulus of the values of the points of that circle.  We’ll see why this is the crucial bound we’ll need in the MMT’s proof below. 


Theorem (Maximum Modulus Theorem).  Given f analytic on some domain D, if f is non-constant on D then the maximum value of |f(z)| for z\in D will occur on the boundary of D.  (Alternatively, if |f(z)| is maximized by some value not on the boundary of D, then f is constant on D.)


Proof.  We’ll split this into two steps.  The first step is for the specific case that D is a closed disk and our maximum modulus occurs at the center of this disk.  The second step will be to get some arbitrary space D and construct some closed disks in the interior of D and "piece these together" to show that f is constant on all of D.

Step 1: Let’s suppose that our maximum modulus is at the center point of D, which we will call z_{0}; that is, we are supposing that |f(z)|\leq |f(z_{0})| for every z\in D.  Since z_{0} is an interior point, we have that there is some r-ball about z_{0} (that is, a ball of radius r) which is completely contained in D.  Let the C denote the circle of radius r centered at the point z_{0}.  By our second lemma above we have that

\displaystyle|f(z_{0})|\leq \frac{1}{2\pi}\int_{0}^{2\pi}|f(z_{0}+re^{i\theta})|\,d\theta

BUT, using that |f(z)|\leq |f(z_{0})| for every z\in D we have that

\displaystyle \frac{1}{2\pi}\int_{0}^{2\pi}|f(z_{0}+re^{i\theta})|\,d\theta \leq \frac{1}{2\pi}\int_{0}^{2\pi}|f(z_{0})|\,d\theta = f(z_{0}).

Stringing these inequalities together and suggestively re-writing |f(z_{0})| = \frac{1}{2\pi}\int_{0}^{2\pi}|f(z_{0})|\,dz, we have that

\displaystyle \frac{1}{2\pi}\int_{0}^{2\pi}|f(z_{0})|\,d\theta = \frac{1}{2\pi}\int_{0}^{2\pi}|f(z_{0}+re^{i\theta})|\,d\theta

and by subtracting,

\displaystyle 0 = \frac{1}{2\pi}\int_{0}^{2\pi}|f(z_{0})| -|f(z_{0}+re^{i\theta})| \,dz

but since the integrand is always positive or zero (why?) it must be the case that

\displaystyle |f(z_{0})| -|f(z_{0}+re^{i\theta})| = 0

or, in other words, |f(z_{0})| =|f(z_{0}+re^{i\theta})|.  Since r was arbitrary, we conclude that |f(z_{0})| = |f(z)| for every z\in D.


Step 2: Now suppose we have some arbitrary domain D and f is analytic on all of D.  I will hand-wave a bit here, but you can fill in the details.  Note that a domain (in this context) necessarily means open and path-connected (and, in fact, it usually denotes a simply connected open subset of {\mathbb C}).   Suppose that our maximum modulus occurs at some point on the interior of D which we will call z_{0}.  Now, given any other point w\in D we have some path from z_{0} to w which is completely contained in D.  In fact, we can make this path a finite polygonal path; that is, a path made out of a finite number of straight lines piecewise-connected together; we will denote this z_{0}L_{0}z_{1}L_{1}\cdots L_{n-1}w, where the L_{i} is the line with endpoints z_{i-1} and z_{i}.  I will let you work the details out here, but it can be done. 




Now, the polygonal line might be right next to a boundary, and we don’t want to accidentally hit it when we start making balls around points, so let \epsilon denote whichever is smaller: the distance from the polygonal line to the boundary, or 1.  So, if your polygonal line is right next to the boundary, we might need to make \epsilon pretty small; but if not, we can just let it be whatever we want, so we might as well make it 1.  Note that since D is open, no point on the polygonal path should be on the boundary.  Now, let’s break up our polygonal path into another polygonal path z_{0}L_{0}z_{1}L_{1}\cdots L_{n-1}w where each L_{i} has length less than \frac{\epsilon}{4}.  It is clear we can do this just by partitioning each straight line in our original path so that their lengths are appropriately small; note, we still only have a finite number of endpoints z_{i}.  That’s important.



(In the picture above, I’ve made the original endpoints blue and then partitioned our polygonal path with the new red endpoints to make each line segment less than \frac{\epsilon}{4}.)

Now everything is going to fall pretty quickly, so keep on your toes.  First, make a disk of radius \epsilon (as defined above) around each z_{i} and call it D_{i}.  Now note that, by our previous step, since our maximum modulus occurs at z_{0}, we have f(z) = f(z_{0}) for every point z\in D_{0}.  But z_{1} is in D_{0}


(This picture is not drawn to scale because I am not a good artist; this is illustrating z_{1} being inside the circle D_{0}.)

So now z_{1} is also of maximum modulus (since z_{0} was) and so f(z_{1}) = f(z) for every point in D_{1}.  Continue this and we will obtain f(z_{0}) = f(w).  Since w was an arbitrary point, it follows that f(z_{0}) = f(w)  for every w\in D.  Hence, if f attains a maximum modulus on the interior of some set D, then it is constant.  This implies directly that any non-constant analytic function achieves its maximum modulus on the boundary.  \Box

[Note: It’s been too long, mathblog!  I’ve been busy at school, but I have no plans to discontinue writing in this thing.  For longer posts, you’ll have to wait until I have some free time!]


You’ve probably seen a graph before: they’re collections of vertices and edges pieced together in some nice way.  Here’s some nice examples from wikipedia:

File:McGee graph.svg

File:Moser spindle.svg



One theorem that I constantly run into (mainly because it’s relevant to homology) is the following:


Theorem.  Given some connected finite graph G, there exists a spanning tree of G.


Note that such a tree is not generally unique.  You might want to find one or two in the graphs above to prove this to yourself!  Nonetheless, even though this proof seems like it would be a bit intimidating (or, at least, by some sort of tedious construction), it’s actually quite nice.  Let’s go through it.


Proof. We’ll prove this by the number of cycles in G.  If G has no cycles then it is, itself, a tree; hence, G is its own spanning tree.  Suppose now that any graph with n cycles has a maximum spanning tree.  Given some graph with n+1 cycles, take any one of these cycles and delete any edge (but not deleting the vertices!).  This reduces the number of cycles by (at least) one, so there is a spanning tree by induction.  In fact, since we have not deleted any vertices, we may replace the edge we removed and this tree will still span the graph.  \Box


A couple of questions about this for the eager reader.  Where did we use connectedness?  Where did we use finite-ness?  What if we were given an infinite graph?  Is there a nice notion of cycles for this kind of graph?  Draw some pictures and think about it!

I’m going through a few books so that I can start doing lots and lots of problems to prepare for my quals.  I’ll be posting some of the “cuter” problems.


Here’s one that, on the surface, looks strange.  But ultimately, the solution is straightforward.


Problem.  Find an uncountable subset A\subseteq [0,1] such that A - A = \{a - b\,\mid\,\,a,b\in A\} has empty interior.


Read the rest of this entry »

I just saw this problem in a book of algebra problems, and I thought it was nice given one of the previous posts I had up here.


Problem.  Show that 11, 111, 1111, 11111, … are not squares.


You ought to think about this for a bit.  I started by supposing they were squares and attempting to work it out like that; unfortunately, there are some strange things that happen when we get bigger numbers.  But.  You should see a nice pattern with the last two digits.  Click below for the solution, but only after you’ve tried it!

Read the rest of this entry »

This post is about a question I just thought about that I thought had a pretty neat solution.  The values we consider below are all in the Reals.


So, we’re all familiar with sequences which have subsequences which converge to two different values; for example, take \{1,-1,1,-1,1,-1,\dots\} which has a subsequence converging to 1 and a subsequence converging to -1.  Similarly, we can construct a sequence containing subsequences converging to three different values in a similar way; for example: \{-1,0,1,-1,0,1, \dots\}.  Indeed, for any finite number, we can make a sequence containing subsequences converging to that number of different values.


The obvious next question is: can a sequence have subsequences which converge to a countable number of different values?  What about an uncountable number of different values?


For the former question, I thought of this solution.  First, enumerate your countable list of distinct values (think of the integers or the rationals, for example) as \{q_{1}, q_{2}, \dots\}.  To construct the sequence, we do the following:

  • Let a_{1} = 0.
  • Let a_{b} = 0 if b is not some positive power of a prime (for example, if b is 15, 21, 35, 100, etc.)
  • For a_{k} where k is a positive power of a prime, we do the following: If k = p_{i}^{r} for r\in {\mathbb N} and p_{i} is the i-th prime (so p_{1} = 2, p_{2} = 3, p_{3} = 5, and so on) then set a_{k} = q_{i} where q_{i} is the i-th element from your list above.


How does this actually work?  Let’s take a simple example.  Let’s let our countable set be the set of natural numbers.  This is easily ordered as \{1,2,3,4,5,\dots\} and so, for this example, q_{1} = 1, q_{2} = 2, q_{3} = 3, and so forth.  Our sequence is now:

\{a_{1}, a_{2}, a_{3}, a_{4}, a_{5}, a_{6}, a_{7}, a_{8}, a_{9}, a_{10},\dots\}

= \{a_{1}, a_{2}, a_{3}, a_{2^{2}}, a_{5}, a_{2\cdot 3}, a_{7}, a_{2^{3}}, a_{3^{2}}, a_{2\cdot 5}, \dots\}

= \{0, q_{1}, q_{2}, q_{1}, q_{3}, 0, q_{4}, q_{1}, q_{2}, 0, \dots\}

= \{0, 1, 2, 1, 3, 0, 4, 1, 2, 0, \dots\}

and if we kept going like this,

= \{0, 1, 2, 1, 3, 0, 4, 1, 2, 0, 5, 0, 6, 0, 0, 1, 7, 0, 8, 0, 0, 0, 0, 0, 3, 0,\dots\}


You kinda see the pattern going on here.  The 1 will appear infinitely many times, but the spaces between it grow exponentially.  Same for 2, 3, 4, and so on.  Thus, this sequence has the subsequence \{n, n, n, n, \dots\} for any n\in {\mathbb N} which obviously will converge to n.


Do you have another neat way to do this?  Encoding this in the primes was my first thought, but you never know!


As for uncountable, my guess is no.  Of course, thinking of the sequence as a SET, we would get the reals from the rationals which is uncountable, but as a SEQUENCE it is not so trivial I feel.  I don’t want to spend too much time thinking about this (as analysis is calling me!) but I’m sure the proof isn’t too crazy.


Edit: Of course, a nice example exists where a sequence has subsequences converging to an uncountable number of distinct point.  In fact, two nice examples exist, and both were provided to me by Brooke (as usual!).


Uncountable Example 1:

It was a bit difficult for me to see the first one, but after doing a nice little thought experiment everything became much clearer.  Here it is:

Let \{q_{1}, q_{2}, \dots\} be an enumeration of the rationals.  The claim is that there is actually a subsequence which converges to any real number.  Think about it for a minute.

Here’s a sketch for the proof.  Take your favorite real number.  Let’s start easy and just say we want a subsequence which converges to \pi.  Let a_{1} be the first element in our enumeration of the rationals above which is a distance less than 1 away from \pi; let’s say it’s q_{k}.  Now let a_{2} be the first element in our enumeration which is after q_{k} and which is a distance less than \frac{1}{2} away from \pi

After that, you just keep picking the "next" element from the list that’s \frac{1}{2^{n}} away from \pi.  There must be one due to the density of the rationals in the reals (or, assume not and see what happens!).  We end up with \{a_{1}, a_{2}, \dots\} as a subsequence of our enumeration of the rationals which converges to \pi.  Obviously, replacing \pi with any other real number works exactly the same.


Uncountable example 2:

This was one that Brooke presented to me that I liked quite a bit since it’s really easy to state.  Here’s the sequence (broken up by lines for clarity):


0.0\,,\, 0.1\,,\, 0.2\,,\, \dots\,,\, 0.9,

0.00\,,\, 0.01\,,\, 0.02\,,\, \dots\,, \,0.09\,,\, 0.10\,,\, 0.11\,,\, \dots\,,\, 0.99\,,

0.000\,,\, 0.001\,,\, 0.002\,,\, \dots\,,\, 0.999\,, \dots


This converges to any real number in [0,1].  To see this is easy: decimal expand your chosen real number and pick the element "from each line" (as I’ve written it above) which is closest to it.  They’ll be at most 10^{-j} away (for some suitable j depending on the line you’re on) and thus a subsequence exists which converges to whatever element you picked.

Jordan’s Lemma.

August 9, 2011

[This post is for those of you who are already comfy with doing some basic contour integrals in complex analysis.]


So you’re sitting around, evaluating contour integrals, and everything is fine.  Then something weird comes up.  You’re asked to evaluate an integral that looks like

\displaystyle \int_{-\infty}^{\infty} e^{aix}g(x)\, dx

for g(x) is continuous.  Eek.  Don’t panic though, because Camille Jordan’s gonna help you out.


Read the rest of this entry »