**http://www.dyinglovegrape.com/**

Enjoy! Leave comments! If I make mistakes [which will be often], let me know!

james.

]]>

Loosely, the law of large numbers states that probability only works as the number of times you do an experiment gets "large" (remember these quotes, they will be important!). Specifically, the (strong) law of large numbers states

**Theorem (Strong Law of Large Numbers). ** Let be independent and identically distributed integrable random variables (if this doesn’t mean anything to you, think about die rolls) with , and define . Then .

This last line means: the probability that the* average value* of will approach the expected value of each *as . *This last part is important to this post, so we put it in italics.

This is, of course, widely applied. When you flip a fair coin, you expect the probability to get heads to be ; indeed, this is why coin flipping is so popular: each side has an equal chance of coming up. This idea, combined with human psychology, can get a bit sticky: if heads has come up 20 times in a row, would you expect it to come up again? Would you expect that it was "time for tails to come up" after that long string? This is illustrated in the Gambler’s Fallacy, an interesting topic which we will not get into in this post, but which is important in its own right. The problem is that, with a small number of flips, the probability will not equal and, indeed, can be quite far away from .

Here is an illustration. I’ve let Mathematica generate 0’s and 1’s at random with equal probability (meaning "tails" and "heads" respectively). If we allow for 5 flips in one "game", the probability of heads (total number of heads over total number of flips) is essentially random; I’ve played ten games and graphed their probabilities below:

As we see, we don’t really get anywhere near one half in some of these games. If we allow for 10 flips for each game, then we get the following:

This is nicer, because it shows that, in each game, we get, on average, pretty close to 0.5. Only a few games are "far away" from 0.5; indeed, we expect such deviation. Let’s look at 100 coin flips for each game:

This shows that most of our games will hover around having an average of 0.5 — what we would expect.

There is a subtle (well, maybe not so subtle — ) point in the law of large numbers: we need a *large *number of trials to get close to the *true *expected value. But…how large is large?

Here, we can use a number of inequalities to see what an "optimal" number of trials would be, but I think it would be nice to use some programs and so forth to *show* the reader what *Large* looks like in some cases.

Here is what we will do. I will construct an experiment to construct a few coin flips. Then we will measure how long it takes to flip and get within a certain range of the expected value. That is, we will flip a few times, count the number of heads, and see how long it takes us before we get within some of 0.5.

We need to be careful here. For example, if we flip heads the first time and tails the second time, this will give us an average of 0.5; but, of course, this is not what we want — we know that this will quickly deviate from 0.5 and then slowly come back to the expected value. For example, here are the first twenty flips for a particular game:

Note that this wildly fluctuates from nearly 100% down to around 40% and back up to around 57% before decreasing towards 45% at the end. Very few times do we get to our expected probability of 0.5. As we increase the number of rolls (using the same data as the previous plot),

We see that this goes down for a while, then begins to go back up again. Note that, though we’ve done this experiment 50 times, we still have a number more "tails" than "heads", though we’d expect them to be equal "after a while." Here is a chart for the first 1000 flips.

Notice that at the very beginning we have the erratic behavior we’ve seen before: this is, of course, because the law of large numbers hasn’t "kicked in" yet; the average value will fluctuate radically with each coin flip. As we go onwards we see that the behavior stops being so erratic (look how, for example, the first part looks "sharp"; this is because it goes up and down quickly) and begins to smooth out. Not only this, but it begins to approach its limiting probability of 0.5. Notice that after 200 flips we begin to get exceptionally close to the limiting probability: we stay around of 0.5 after 200 flips. This is what we’d expect.

A similar thing happens when we roll dice, though the experiment is slightly more interesting. For a 4-sided die (known henceforth as a "D4") the expected value is equal to . We do the exact same experiment as before, and record the values we obtain for the first 10 rolls:

We see that, despite eventually "coming down" at the end, the rolls spend a lot of time above the expected value. Similarly, other iterations of this experiment have given other wild behaviors: staying near 1, fluctuating between 2 and 4 wildly, staying at 3, and so forth. Let’s see what happens as we continue rolling 40 more times:

One notices that we are now "closer" to the expected value, and the rolls are "around" 2.5, within 0.1 on either side. Let’s add some more rolls until we get to 200:

Whoops. We’re still not quite at 2.5, though we are consistently with 0.1 of it. Maybe if we add some more rolls?

We’re getting closer. In the last graph, we were within 0.1 of the expected value. Now we are within 0.05 of the expected value. It looks like it took around 600 or so rolls to consistently start lying within this range and, in general, it looks like around 100 rolls are "good enough" to lie within 0.5 of the expected value (though, the figure above this one shows that it is still a bit out of range…). At last, look at this last figure:

We see that after a significant number of rolls (around 6500) we finally are consistently within 0.004 or so of our expected value. This is pretty close if you only need an error of 0.004, but 6500 is a pretty large number — if this was an experiment where we needed to hit a small particle with another small particle, for example, it may take significantly more than 6500 tries to get within a proper range since an error of 0.004 may be extremely large in this case.

Rolling two D4 do a similar thing, though it seems that they converge to their expected value (of 5) quicker. For example, here is the first 100 rolls:

Notice, though, before 20 rolls, the average is erratic and strange. Even after 20 rolls, the rolls are only within 0.1 of the expected value. For 1000 rolls,

we notice that we get somewhat closer to the expected value, getting within about 0.1 after around 600 rolls. If we take this to the extreme 10,000 rolls, we obtain a much nicer plot:

It takes us around 6000 rolls to get within 0.01 of the expected value, it seems. Of course, we expect a large number like this. We have "precise" ways of telling "how much error" is going to happen on average, as well. These estimates are based on something called z- or t-values and the *standard deviation*. In particular, a lot of emphasis is placed on standard deviation. How well does standard deviation measure the average deviation of our experiments?

The standard deviation is a measure which tells us, on average, how much our data deviates from the expected value. There are a number of ways to do standard deviation, but generally one computes two different kinds of standard deviations: the population and the sample standard deviation. Here’s how to compute these:

**Population Standard Deviation:** If we have data for a population, we will also have the expected value for the population. Because it is a population, this will be *exact:* it is not approximating anything. The standard deviation is given by , the square root of the variance.

**Sample Standard Deviation:** For this one we do not know what our expected value of the population is, and, hence, we have only an *approximation *of that average given by the average . We calculate the sample standard deviation in the following way:

Note that we’ve used an instead of because of Bessel’s Correction.

Suppose we have a 6-sided die (henceforth, D6). The expected value is easily calculated; the value can be calculated in a similar way. These two values give us that the population standard deviation is equal to .

In a real test, how long does it take until we get close to this standard deviation? We do some trial rolls and plot the sample standard deviation each time. For 20 rolls,

Notice that our sample standard deviation spends a lot of time meandering back and forth, and gets to only about 0.2 or so within the population standard deviation. Let’s do the same for 100 rolls,

We see that this eventually gets to within about 0.1 of the population standard deviation after 80 or so rolls. After 250 rolls, below we see that the sample standard deviation gets quite close to the population standard deviation (within about 0.01):

Let’s do another example which isn’t so *dull*. This time, we will roll two D6’s. One can calculate the expected value and so forth, and from this we find that the standard deviation . As usual, we expect the first 10 rolls to not give too nice of a graph:

What about after 100 rolls?

We see that the standard deviation is still quite far away from the population standard deviation even after 100 rolls, though it has stabilized within about 0.5 above it. What about 300 rolls?

We are much closer now: it is within 0.2 after some 250 or so rolls. We see that, eventually, the sample standard deviation will become closer and closer to the population standard deviation.

One common rule that is used frequently is the *Empirical Rule* saying that, in a normal distribution, 68% of data is one standard deviation from the mean, 95% is between two standard deviations, and 99.7% is between three standard deviations. These are common "rules of thumb" which are approximately true. When we want to know *approximately where* most of our data will lie, we build something called a *confidence interval. *The "rule of thumb" equation for a 95% confidence interval when we have points of data is: .

This "2" comes from the empirical rule (though it is not actually correct to use in the case where is small; one must use something called a t-value). The 95% confidence interval says that, if we were to do the experiment 100 times, 95 of those times we will have the expected value (the one we calculated above) to be in the confidence interval. The accuracy of the interval is dependent on *s* and being close to the true mean and true standard deviation; as we’ve seen, this is often not the case for small samples. Moveover, the "2" in the formula, as we’ve noted, is not always a good estimate when is small — what is usually done is, there is a correction term (called a t-value, which is a larger number than the corresponding z-value, which the "2" is approximating at 95% confidence) will give us a bigger interval so that we can be more confident that our true mean is within these ranges. Unfortunately, this larger interval is basically useless when we attempt to "guess" the true mean from the data — that is, as before, the data simply isn’t a good enough estimate.

The point is this: when you need to apply the law of large numbers make sure that you have a large enough number of trials. In particular, if you are using expected value and standard deviation make sure that the number of trials is large enough for these to be accurate representations of your samples.

It’s a bit of a strange feeling to go in reverse here — usually one uses a sample’s data and attempts to reconstruct the population, but here we are using the population’s data and noting that the sample may not always represent it accurately — but it is important to note. Especially because I’ve seen (e.g., in various game-making and game-analyzing forums) these concepts used time and time again without any reference to the law of large numbers and the necessity for a large number of trials.

In the next post, we will go over simple games (like die rolling and coin flipping) and discuss which of these games can benefit from using expected value and which games (which I call "Luck Based") are games which, regardless of strategy, are essentially based on the "luck" of the roller — that is, one where one particularly "good" roll will give a large advantage/disadvantage to a player from which it is difficult to recover from. For example, if we had a game where each player rolls a die in turns and sum up the numbers they have rolled, if they use a D6 and race to 7 then the first player rolling a 1 is a severe disadvantage, from which it is difficult to recover from. On the other hand, if they raced to something like 1000, each roll would have less of an effect and, in general, the rolls would eventually approximate the number of turns times the expected value — this type of game is "less" based on luck. We will discuss some of these things in the next post.

]]>

**Initial Problem**. Given two natural (positive whole) numbers called , can we find some other natural number that divides both of them?

This problem is a first step. It’s nice to be able to write numbers as a multiple of some other number; for example, if we have 16 and 18, we may write them as and , thus giving us some insight as to the relationship between these two numbers. In that case it may be easy to see, but perhaps if you’re given the numbers 46629 and 47100, you may not realize right away that these numbers are and respectively. This kind of factorization will reveal "hidden" relationships between numbers.

So, given two numbers, how do we find if something divides both of them — in other words, how do we find the *common divisors *of two numbers? If we think back to when we first began working with numbers (in elementary school, perhaps) the first thing to do would be to note that 1 divides *every number*. But that doesn’t help us all that much, as it turns out, so we go to the next number: if both numbers are even, then they have 2 as a common factor. Then we "factor" both numbers by writing them as and then attempt to keep dividing things out of the *something*. We then move onto 3, skip 4 (since this would just be divisible by 2 twice), go onto 5, then 7, then…and continue for the primes. This gives a *prime *factorization, but we have to note that if, say, 2 and 5 divide some number, then so does 10. These latter divisors are the *composite *factors.

This seems excessive, but it is sometimes the only way one can do it.

**Anecdote!: **On my algebra qualifying exam, there was a question regarding a group of order 289 which required us to see if 289 was prime or not; if not, we were to factor it. We were not allowed calculators, so what could we do? Try everything. Note that we only need to try up to the square root of the number (which we could estimate in other ways), but it’s still a number of cases. If you check, none of the following numbers divide into 289: 2, 3, 5, 7, 11, 13. At this point, I was about to give up and call it a prime, but, for whatever reason, I decided to try 17. Of course, as the clever reader will have pointed out, . It is not prime. There was, luckily, only one student who thought it was prime, but it points out how the algorithm above is not entirely trivial if one does not have access to a computer or calculator.

Once we have a common divisor, or a set of common divisors, a natural thing to want to do is to find the *biggest* (we already have the smallest, 1) since in this way we can write our numbers with the largest common factor multiplied by some other number. It will, in effect, make things prettier.

**Real Problem.** Find the *greatest* divisor which is common to two natural numbers, .

If you were just learning about this kind of thing, you may spout out the following solution: find *all *of the common divisors, then pick the greatest. While this is not especially efficient, it *is* a solution. Unfortunately, even for small numbers, this gets out of hand quickly. For example, 60 and 420 have the following common divisors: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60. This takes a while to compute by hand.

Even if we were to find prime factors, this would be and , which gives us that they share a number of prime factors. A bit of thinking gives us that we take all of the prime factors they "share" and multiply them together to get the greatest common divisor. This is another potential solution which is much faster than simply listing out all of the common divisors. Unfortunately, this falls prey to the same kind of trap that other prime-related problems do: it is, at times, especially difficult to factor large composite numbers. For example, the "reasonably small" number 49740376105597 has a prime factorization of ; this is not at all efficient to factor if one does not have a computer or a specialized calculator with a factoring algorithm on it. As a mean joke, you may ask your friend to factor something like 1689259081189, which is actually the product of the 100,000th and 100,001st prime — that is, they would need to test 99,999 primes before getting to the one which divides the number. If they divided by one prime per second (which is quite fast!) this would take them 1 day, 3 hours, and 46 minutes. Not especially effective, but it *will *eventually get the job done.

**Real Problem, With Efficiency: **Find the greatest divisor which is common to two natural numbers, , but do so in an efficient manner (we’ve all got deadlines!).

We need to sit down and think about this now. We need an entirely new idea. We note, at least, that for the two numbers that one of them must be larger than the other (or else the problem is trivial). One thing to try would be to see if the smaller one goes into the larger one (for example, above we had 60 going into 420, which gave us the easy solution that 60 must be the greatest common divisor). If not, maybe we can see how much is left over. That is, if is the larger number,

where here is the number of times goes into without exceeding it, and is the "remainder"; if it’s equal to 0, then evenly divides into , and otherwise it is less than (or else we could divide an additional into ).

Using this, if , we may write ; this means that, in particular, divides and , so it is a factor of and of . But it may not actually be a factor of ; so let’s see how many times it goes into . Using the same process…

and by rearranging, we have that is divisible by . So, is divisible by , but we aren’t sure if is divisible by …if it were, we would be able to say that was a common divisor of and (why?). That’s *something* at least.

The cool thing about our algorithm here is that, because we have that either and we’re done with the algorithm, or and we may form a new equation ; this equation has, on the left-hand side, the number which is less than the previous equation’s left-hand side, which was . Continuing this process, we will have on the left-hand side, each of which is less than the one which came before it. Because for any of the remainders, *eventually* it will become 0 (why?) and this algorithm will terminate. That is, we will have found *some * which is a common divisor for both ; specifically, it will be the such that (or, it may simply be if divides ).

This algorithm, called the *Euclidean Algorithm,* actually does more "automatically": it not only finds a common divisor, but actually finds the *greatest common divisor *of , which, from now on, we will denote . The "proof" of this is simply noting that (we noted this above without making reference to the gcd, but the reader should attempt to go through all the same steps using the idea of the gcd).

So. If you have two natural numbers, , you divide them, find the remainder, write the equation, then continue as above until you get a 0 remainder. Then you pick the remainder directly before you got 0 as your gcd (or, you pick the smaller number if one number divides the other). Pretty simple algorithm, but is it efficient?

Without going into formal "efficiency" definitions, "yes", it is quite efficient. To prove it, let’s take an "average" example using the "large" numbers 1337944608 and 4216212. We note that (by pen and paper, or by using a standard calculator) that

1337944608 = 317(4216212) + 1405404.

Next, we note that

4216212 = 3(1405404) + 0

which instantly gives us the solution . That’s pretty awesome. Note that this was an especially quick trial, but even the "worst" ones are relatively quick.

**Unexpected Corollary!: **For natural numbers, if then there exists integers such that .

This is more useful than you might think at first glance, and we’ll get into why in a later post, but what’s nice about this corollary is that it comes "for free" from the Euclidean algorithm. Note that, since divides , it suffices to prove this corollary for where have . The proof uses induction on the number of steps of the Euclidean algorithm for those numbers, but for those of you who are more experienced and know modular arithmetic, you may enjoy the following simple proof:

*"Clever" Proof of the Corollary: * Let (for equality, the proof is easy). We will only care about remainders in this proof, so we will look at some numbers modulo . Consider

Note there are exactly remainders here and that the remainder never occurs (since are relatively prime). Suppose that for each of the ; that is, the remainder 1 does not ever show up in this list. By the pigeon-hole principle (as there are remainders but only possible values for the remainders) we must have that for some . That is, we have

which implies

but this is impossible, since it implies that either or is some integer multiple of , but and we have assumes are relatively prime. Hence, the remainder must occur. That is, for some and

But what does this mean? It means that there is some integer such that . To make this prettier, let and we find that there exists integers such that , as required. .

Pretty slick, no?

]]>

The "natural" extension of the notion of a surjective map (in, say, the category of sets) is

**Definition**. A map is an *epimorphism* if, for each object and map we have that if then .

You should prove for yourself that this is, in fact, what a surjective map "does" in the category of sets. Pretty neat. Similarly, for injective maps (in, say, the category of sets) we have the more general notion:

**Definition**. A map is a *monomorphism* if, for each object and map we have that if then .

Again, you should prove for yourself that this is the property that injective mappings have in the category of sets. Double neat. There is also a relatively nice way to define an isomorphism categorically — which is somewhat obvious if you’ve seen some algebraic topology before.

**Definition**. A map is an *isomorphism* if there is some mapping such that and , where denote the identity morphism from the subscripted object to itself.

Now, naively, one might think, "Okay, if I have some certain kind of morphism in my category (set-maps, homomorphisms, homeomorphisms, poset relations, …) then if it is an epimorphism and a monomorphism, it should automatically be an isomorphism." Unfortunately, **this is not the case.** Here’s two simple examples.

**Example (Mono, Epi, but not Iso)**. The most simple category for which this works is the category **2**, which I’ve drawn below:

There are two objects, and three morphisms, the identites and the morphism . First, prove to yourself that this is actually a category. Second, we note that is an epimorphism: the only map from is the identity, and there is no mapping from , so the property trivially holds. Third, we note that is a monomorphism for the exact same reason as before. Last, we note that is *not *an isomorphism: we would need some which satisfied the properties in the definition above…but, there *is no map* from . Upsetting! From this, we must conclude that cannot be an isomorphism despite being a mono- and epimorphism.

**Similar Example (Mono, Epi, but not Iso). **Take the category , the natural numbers with morphisms as the relation . Which morphisms are the monomorphisms? Which morphisms are the epimorphisms? Prove that the *only *isomorphisms are the identity morphisms. Conclude that there are a whole bunch of morphisms which are mono- and epimorphisms but which are not isomorphisms.

**Theorem (Burnside)**. If is a finite group of order where are non-negative integers and where are primes, then is solvable.

The second one is also a group theoretical result, but a bit more combinatorial-feeling. In some books (and, apparently, Wikipedia) this second result is called Burnside’s Lemma. As noted in the Wikipedia article, this theorem was not even due to Burnside, who quoted the result from Frobenius, who probably got it from Cauchy.

Let’s get some definitions down. As usual, we’ll denote the *order* of the group by , and our groups will *all be finite in this post*. If we have a group which acts on a set , then given some fixed we define the set ; this is, of course, the fixed points in when acted on by the element ; for the remainder of the post, we will simply write with the set implied. Remember, when a group acts on , the "product" will sit inside of , and we write the action as for an element acting on an element . The *orbit* of a point when acted on by is given by ; we’ll denote this , though this is not standard notation. The orbit, essentially, is all of the possible values that you can get to by acting on with elements in your group.

One thing to note, also, is that orbits are *pairwise* *disjoint. *You should prove this to yourself if you haven’t already, but the idea is like this: if are orbits of elements in then suppose ; then there is some such that , but this implies which implies the orbits are identical (why?). Hence, each element of is in *exactly one orbit*.

We need one more result before we can sink our teeth into Burnside. Remember the fixed point set above? This was all of the elements such that for some . There’s a similar notion called a *Stabilizer*, denoted ; this is saying that we first fix , and then look at all the elements which stabilize it. These definitions are pretty similar feeling (almost like family!) and, in fact, there is a nice relation between the two:

**Notation. **Let denote the set of orbits of when acted on by ; when is a group and is a subgroup this is the same as a quotient.

**Theorem (Orbit-Stabilizer Theorem).** There is a bijection between and .

That is, if we act on by elements of which fix , then we will have the same number of elements as the orbit of . This might seem a little confusing at first, but if you work through it, it’s not so weird.

*Sketch of the Proof. *(Skip this if you’re not comfortable with all this notation above; just go down to the next theorem.) Here, we want to show a bijection. Notice that $G/G_{x}$ is the set of cosets for . We claim that the mapping which sends is well-defined, injective and surjective (but not a homomorphism). First, well defined: if then $latex $hk^{-1}\in G_{x}$ which means that . This implies, after some manipulation, that , which means these elements are identical in . Second, surjectivity is clear. Last, if in the orbit, then which implies which gives ; hence this map is injective. This gives us that our map is bijective.

One immediate corollary is that ; that is, the number of elements in the orbit of is the same as the number of elements in divided by the number of elements in which fix . Think about this for a minute.

Okay. Now, let’s think about something for a second. What is the sum

telling us? This is the number of elements in which are fixed by some ; but there might be some overlap, since if and , then will be counted twice: one as an element of and once as an element of . But how much overlap is there? This is an innocent seeming question, and you might think something like, "Well, depends on how much stuff stabilizes each .", and this is pretty close to the point.

First, note that

which is just the long way to write out this sum; but, the nice part about that is, we can now think about this as all of the elements of which are stabilized by some (why?). Then,

If you don’t see this, you should prove to yourself why they’re the same sum (why is each element counted in the left-hand side also counted in the right-hand side?). Now, by the Orbit-Stabilizer theorem above, this right-hand sum becomes pretty nice. Specifically,

where we noted in the last equality that is a constant, so we may pull it out of the sum.

Recalling that denotes the number of orbits, we have that if we take a single orbit (call it ) we will be adding up exactly times (since the sum is taken over each so, in particular, over each $x\in A$); hence, we will add for each orbit we have in . That is;

Putting this all together, we have

We clean it up a bit, and state the following:

**Theorem (Burnside’s). **For a finite group and a set , with notation as above, we have .

That is, the number of orbits is equal to the sum, over , of the elements of fixed under , averaged by the number of elements in . Kind of neat.

Next time, we’ll talk about applications!

]]>In order to teach kids about the concept of area, math teachers sometimes showed pictures which looked like

and we would be asked to "guess the area." If this was being taught to younger students, the way to do it would be to draw little boxes inside and then try to see how many "half-boxes" were left, how many "quarter-boxes" were left, and so forth. If this was a more sophisticated lesson, our teacher would instruct us to cut up the picture into rectangles and triangles — the one above, for example, has a right triangle on the left-hand side and a 4×2 rectangle on the right-hand side. From this, we could easily use the formula for area for each, and we could deduce the total area from these.

So far nothing is too difficult to solve. But, if we adjust the picture slightly (still using only straight lines to construct our figure) we may get something like

Here, it is difficult to tell what to do. We could try to find "easier" triangles to break this into, but none are immediately obvious. It’s not easy to tell what the angles are (except by using some algebraic methods, finding slopes, etc.). The best one could do would be to "guess" what the lengths of the sides were (or the lengths of the base and height of the triangle).

Before we begin, let me define some of the things we’ll be working with. A *polygon* is a familiar concept, but we ought to say some things about it:

A *lattice polygon* is a figure on the two-dimension lattice (as above) such that all of its edges begin and end on a lattice point, no edges overlap except at a lattice point, and its boundary is a closed path (informally, this means that if we started at a lattice point and "traveled" along an edge and kept going in the same direction, we’d eventually come back to where we started). Moreover, the polygon has to have a distinct interior and exterior (as in, it cannot look like a donut with a hole in the middle). [Topologists, note: this means that the boundary of our figure is homeomorphic to a circle and the interior is simply connected.]

The "easiest" polygons to work with in terms of area are rectangles: once we find the side lengths, we’re done. Let’s look at the rectangle below, ignoring the stuff in the lower-left for now.

We can easily find the area here: it is a 7×3 rectangle, giving us an area of 21. Notice something kind of neat, though; if we look at each interior point of the rectangle, we can associate a unit square with it (In the picture above, I’ve identified the point "a" with the first unit square "a", and the lattice point "b" with the square "b"). Using just the interior points, we can fill up most of this rectangle with unit squares, as below:

We see that we get "almost" all of the area accounted for if we only identify interior points like this; we miss the "Â¬" shape on the top. But besides this weird shape, we learned that if we have a rectangle with interior points, we will be able to fill up units squared of area in our rectangle.

If we do the same process (which I mark with a different color) for the boundary points on the top of the rectangle by identifying them with a unit square to their lower-left, we notice that we have to "skip" one on the far-left.

[For clarification, the second lattice point on the top of the rectangle corresponds to the first box, the third point corresponds with the second box, and so forth. We are forced to skip one lattice point (the first one) on the top.]

Notice that if we have interior points and points on the top boundary of the rectangle, then there must be points in each row of interior points; hence, there must be points in each column of interior points.

Notice, last, that in our picture we need only fill in the remaining few parts on the right side. We notice that the number of squares needed to fill in this side is exactly the same number of points in each column of interior points (check this out on a few different rectangles if you don’t believe it).

Whew. Okay. So. We have interior points, each corresponding to a unit square. All but one of the top boundary points corresponds to a square; this is unit squares. Last, we have the number of points in each columns of the interior points corresponding to the remaining unit squares; this is unit squares.

At this point we want to write this nicely, so we’ll denote the TOTAL number of boundary points , and note that .

**The total area we found was **. If we rearrange this a bit, we notice that

This is exactly the statement of Pick’s theorem, and it is true more generally, as we’ll see.

We’ll briefly cover one more example. Triangles are a natural figure to think about, so let’s see if Pick’s formula works here. First, right triangles are easy to work with, so let’s look at this one (ignore the blue part for now):

Notice that this triangle is exactly half of a rectangle (this is completed by the blue part), and it *just so happens to have no lattice points on the diagonal*. This last part is important so look again: none of the dots touch the diagonal of the red triangle (here, we are excluding the vertices of the triangle, which we don’t count as being on the diagonal). Some come close, but none lie on it. Of course, *this is not true in general, *but for now let’s just look at this triangle.

If we use the formula above for the rectangle, we get that the area is for the rectangle (in this specific case, and ), and half of this will be .

On the other hand, if we look at the interior points of the triangle, if none of them lie on the diagonal (like above) then we have *exactly half of what the rectangle had*, so our triangle has interior points; the number of boundary points will be *half of the number of boundary points of the rectangle, plus 1. *This can be seen as follows: if we consider all the points on the bottom boundary and all the points on the left boundary except for the top-most point, then this is exactly half of the boundary points of the rectangle. Hence, the number of boundary points we have for our triangle is .

Plugging this information into Pick’s formula (which, at this point, we only know is valid for the rectangle!) we obtain: . This is exactly the area we calculated before, giving us a verification that Pick’s formula works for right triangles with no lattice points on the diagonal.

How do we get around the condition that no lattice points should be on the diagonal? There is a relatively easy way to break up right triangles into other right triangles, none of which will have points on their diagonals. I’ll demonstrate with this picture below:

The idea is to just take the big triangle, draw some vertical and horizontal lines from the lattice points which lie on the diagonal, until you get smaller triangles (which will have no lattice points on the diagonal) and a bunch of rectangles. In this case, I first got two triangles (a small one on top, and a small one on the bottom right) and one little 4×3 rectangle in the lower-left. You then split the rectangle in half, which gives you some more triangles; if these triangles had lattice points on the diagonal, I would have repeated the process, getting even smaller triangles and rectangles. Because everything is nice and finite here, we don’t get infinite numbers of rectangles and triangles and such: this process will eventually stop. We apply the above to each and note that **this verifies Pick’s formula for any right triangle.**

But even right triangles are a bit restrictive. It would be nice if it were true for *any *triangle. Indeed, there is a similar process which decomposes triangles into right triangles. In fact, there are a number of such processes: for example, prove to yourself that every triangle has at least one altitude which is entirely contained in the triangle, and note that this splits the triangle into two right triangles. However you show it, **this verifies Pick’s formula for any triangle**.

**Theorem (Pick). **Given a lattice polygon with interior points and boundary points, the total area enclosed by the figure is given by .

The proof of this is done by induction. This is a more unusual type of induction since it requires us to induct on the number of triangles a figure is made up of. The difficult part has already been completed: we have shown that the formula holds for any triangle. We need a fact which I will not prove here:

**Fact: **Every polygon can be decomposed into a collection of triangles which only intersect at their boundaries. Moreover, lattice polygons can be decomposed into a collection of lattice triangles (triangles whose vertices are lattice points) which only intersect at their boundaries.

This process is called polygon triangulation. In layman’s terms, it means that you can take a polygon and cut it up into a bunch of triangles. Try it yourself for a few polygons!

Given all of this, let’s jump into the proof.

**Proof.** By induction. We have already proved the formula holds for triangles, so suppose the formula holds for all polygons which are able to be decomposed into or fewer triangles. Take such a polygon which is able to be decomposed into exactly triangles and "attach" a triangle to the boundary of such that the resulting figure is still a lattice polygon; call this new polygon .

For the triangle, denote its boundary points by and its interior points by ; similarly, for denote its boundary points by and its interior points by . Denote the common points that and share by .

For interior points, note that we have added the interior points of and together, but we also obtain those points which they share on their boundary, except for the two points on the vertex of the triangle which are shared; that is, .

For boundary points, we have to subtract points from ‘s boundary and points from ‘s boundary (for the same reason as in the previous paragraph). If we add together the boundary points of minus the common points and the boundary points of minus the common points, we will be counting those two points on the vertex of the triangle which are shared *two times* (why?), so we need to subtract 2 so that we only count these points once. Hence, .

At this point, let be the area of and be the area of ; we have that:

We note now that and from above, which gives us

This verifies Pick’s formula for our lattice polygon , and since any lattice polygon can be constructed this way (from finitely many triangles) this shows that Pick’s formula holds for *any* lattice polygon.

**Problem 1. **Let . Show that is a polynomial for each and that the degree of the polynomial is .

Indeed, for example, we have that , as we learned in Calculus, and this is a polynomial of degree 2. Similarly, , which is a polynomial of degree 3. In the same respect, , which is a polynomial of degree 4.

The associated polynomials in this case are given by Faulhaber’s formula:

**Theorem (Faulhaber).** For we have .

This formula looks terrifying, but it is not hard to apply in practice. You may be wondering, though, what the ‘s in this formula stand for. These are the strange and wonderful Bernoulli numbers, of course! I always enjoy seeing these creatures, because they unexpectedly pop up in the strangest problems. There are a number of ways to define these numbers, one of which is to just write them out sequentially, starting with :

But in this case it is not so easy to guess the next value. The clever reader will notice that all of the odd numbered Bernoulli numbers (except the first) are zero, but other than that there does not seem to be a clear pattern. Fortunately, we can construct a *function *which *generates* the values as coefficients; we’ll call this function (surprise!) a *generating function.*

**Definition.** We define the sequence by

.

Notice that this will, in fact, generate the as coefficients times . Neat. In practice, you can use a program like Mathematica to compute for pretty large values of ; but, of course, there are lists available. We can now use Faulhaber’s formula above, which gives us (assuming we have proven that the formula holds!) that the sums of powers of natural numbers form polynomials of degree .

But something else happens that’s pretty interesting. Let’s look at some of the functions.

Look at the coefficients in each of these polynomials. Anything strange about them? Consider them for a bit.

**Problem. **Look at the coefficients. What do you find interesting about them? Note that, in particular, for a fixed , the coefficients of the associated polynomial sum to 1. Convince yourself that this is probably true (do some examples!) and then prove that it is true. Do this before reading the statements below.

**Anecdote. **I spent quite a while trying to write down the "general form" of a polynomial with elementary symmetric polynomials and roots to try to see if I could prove this fact using some complex analysis and a lot of terms canceling out. This morning, I went into the office of the professor to ask him about what it *means* that these coefficients sum up to 1. He then gave me a one-line (maybe a half-line) proof of why this is the case.

*Hint. What value would we plug in to a polynomial to find the sum of the coefficients? What does plugging in this value mean in terms of the sum?*

**Question: **Suppose that is a sequence of polynomials with each of degree and suppose pointwise. Show that is also a polynomial of degree no more than .

Some interesting points come up here. First is that we only have pointwise convergence — it wasn’t even immediately obvious to me how to prove the resulting limit was *continuous*, let alone a polynomial of some degree. Second, we know very little about the polynomials except for what degree they are. This should be an indication that we need to characterize them with respect to something degree-related.

Indeed, polynomials can be represented in a few nice ways. Among these are:

- In the form where it is usually stated that .
- In terms of their coefficients. That is, if we have a list of polynomials of degree 3 to store on a computer, we could create an array where the first column is the constant, the second is the linear term, and so forth. This is sort of like decimal expansion.
- Where they send each point. That is, if we know what is equal to for each , we could recover it.
- If a polynomial is of degree then, somewhat surprisingly, we can improve upon the previous statement: if we know the value of for distinct points, then we can find , and is the unique such polynomial of that degree which has those values. (Note that if we were to have points and a polynomial of degree , then
*many*polynomials of this degree could fit the points. Consider, for example, and . Then we have one points and we want to fit line through it. Clearly this can be done in infinitely many ways.)

This last one is going to be useful for us. So much so that it might be a good idea to prove it.

**Lemma. **Let be distinct points in , and let be distinct points in . Then there is a *unique *polynomial of degree at most such that for each considered here.

*Proof. *This is an exercise in linear algebra. We need to solve the system of linear equations

where spans , for the constants . Notice that this is simply plugging into a general polynomial of degree . Notice that the matrix that this forms will be a Vandermonde matrix. Since each is distinct, the determinant of this matrix is nonzero, which implies that there is a unique solution. This gives us our coefficients, and note that this is a polynomial not necessarily of degree exactly , since some coefficients may be 0, but it is at most .

[Note: For those of you who forgot your linear algebra, the end of this goes like this: if we let our coefficients be denoted by the column matrix and our Vandermonde matrix is denoted by , then we want to solve where is the column vector with entries . If has non-zero determinant, then it is invertible, and so we have that gives us our coefficients.]

Neato. But now we need to specialize this somewhat for our proof.

**Corollary. **Let the notation and assumptions be as in the last lemma. For , let be the unique polynomial of degree at most with (where if and ). Then every polynomial of degree at most is of the form for each .

This might be a bit more cryptic, so let’s do an example. Let’s let so that we have two points. Let’s say and . Then we have is the unique polynomial of degree at most 1 such that and . Of course, this function will be . Now and ; this gives us that . The theorem now states that any polynomial of degree at most can be written in the form

.

For example, let . Then the lemma says , as we’d expect. The power of this lemma will become clear when we use this in the solution. The proof of this corollary is just a specialization of the previous lemma, so we exclude it.

**Solution.** Recall, just for notation, that our sequence pointwise. Let’s let be our distinct points, as usual. In addition, let’s let be defined as in the corollary above. Represent each as follows:

for each . Here comes the magic: let and note that at every point, so, in particular, on each and on . We obtain

for each . But this is the sum of polynomials of degrees at most , which gives us that is itself a polynomial of degree at most .

I’ll admit, I did a bit of digging around after finding the lemma above; in particular, this corollary representation of polynomials seems to be a nice way to represent a polynomial if we do not know the coefficients but do know the values at a certain number of points and have that its degree is bounded below that number of points.

**Exercise: **Try to write out this representation for and . If you’re a programmer, why not make a program that allows you to input some points and some values and spits out the polynomial from the corollary above?

**One Solution. **It’s easy to prove is a subspace. Then, there is a representation of any function in this space by adding odd and even functions together; more precisely, given we have that is even and is odd and . For uniqueness, note that if , then for each , giving us that . Hence, the orthogonal complement of is the set of odd functions.

Here’s another solution that "gets your hands dirty" by manipulating the integral.

**Another Solution. **We want to find all such that for every even function . This is equivalent to wanting to find all such with . Assume is in the orthogonal complement. That is,

The last equality here re-parameterizes the first integral by letting , but note that our new gives us the negative sign.

.

We may choose since this is an even function, and we note that this gives us

.

Since , it must be the case that . [Note: The fact that this is only true "almost everywhere" is implicit in the definition of .] Hence, , giving us that .

We now have one direction: that *if * is in the orthogonal complement, *then *it will be odd. Now we need to show that if is any odd function, it is in the orthogonal complement. To this end, suppose is an odd function. Then by the above, we have

where the last equality comes from the fact that is odd.

]]>

**Theorem (Gauss’ Mean Value Theorem). **Let be analytic on some closed disk which has center and radius . Let denote the boundary of the disk (that is, is the circle bounding ). Then we have that .

The proof of this theorem is pretty straight forward and uses the Cauchy integral formula and some easy substitution.

*Proof. *Note that we have . The equation of a circle with radius and center is given by where runs from 0 to (if you don’t believe me, plot some points!). Substituting this value into the integral and noting that we have that

as required.

Why bring up this neat little theorem? Well, by itself it doesn’t seem to be all that useful — when would we be able to calculate and sum up a whole ton of values of an analytic function surrounding a point, but not be able to find the point itself? But this little theorem packs some punch as a way of bounding certain values. In particular, it gives a neat proof of the Maximum Modulus Theorem. You might have guessed this from the title of this post.

First, let’s note something quickly.

**Lemma. **Given the assumptions in Gauss’ MVT, we have .

Be careful here in thinking that this should be an *equality*; we are now looking at the *modulus *of our value, and the *modulus* of each point on the circle. But this lemma comes almost for free:

*Proof. *We have by using Gauss’ MVT and simply taking the norm of both sides. Note that

whence the inequality above.

This lemma tells us that the value of the center of any circle is bounded by the sum of the modulus of the values of the points of that circle. We’ll see why this is the crucial bound we’ll need in the MMT’s proof below.

**Theorem (Maximum Modulus Theorem). **Given analytic on some domain , if is non-constant on then the maximum value of for will occur on the boundary of . (Alternatively, if is maximized by some value not on the boundary of , then is constant on .)

*Proof. *We’ll split this into two steps. The first step is for the specific case that is a closed disk and our maximum modulus occurs at the center of this disk. The second step will be to get some arbitrary space and construct some closed disks in the interior of and "piece these together" to show that is constant on all of .

**Step 1:*** *Let’s suppose that our maximum modulus is at the center point of , which we will call ; that is, we are supposing that for every . Since is an interior point, we have that there is some -ball about (that is, a ball of radius ) which is completely contained in . Let the denote the circle of radius centered at the point . By our second lemma above we have that

.

BUT, using that for every we have that

.

Stringing these inequalities together and suggestively re-writing , we have that

and by subtracting,

but since the integrand is always positive or zero (why?) it must be the case that

or, in other words, . Since was arbitrary, we conclude that for every .

**Step 2: **Now suppose we have some arbitrary domain and is analytic on all of . I will hand-wave a bit here, but you can fill in the details. Note that a domain (in this context) necessarily means *open and* *path-connected* (and, in fact, it usually denotes a simply connected *open* subset of ). Suppose that our maximum modulus occurs at some point on the interior of which we will call . Now, given *any other point * we have some path from to which is completely contained in . In fact, we can make this path a finite *polygonal* path; that is, a path made out of a finite number of straight lines piecewise-connected together; we will denote this , where the is the line with endpoints and . I will let you work the details out here, but it can be done.

Now, the polygonal line might be right next to a boundary, and we don’t want to accidentally hit it when we start making balls around points, so let denote whichever is smaller: the distance from the polygonal line to the boundary, or 1. So, if your polygonal line is right next to the boundary, we might need to make pretty small; but if not, we can just let it be whatever we want, so we might as well make it 1. Note that since is open, no point on the polygonal path should be on the boundary. Now, let’s break up our polygonal path into another polygonal path where each has length less than . It is clear we can do this just by partitioning each straight line in our original path so that their lengths are appropriately small; note, we still only have a finite number of endpoints . That’s important.

(In the picture above, I’ve made the original endpoints blue and then partitioned our polygonal path with the new red endpoints to make each line segment less than .)

Now everything is going to fall pretty quickly, so keep on your toes. First, make a disk of radius (as defined above) around each and call it . Now note that, by our previous step, since our maximum modulus occurs at , we have for every point . But is in !

(This picture is not drawn to scale because I am not a good artist; this is illustrating being inside the circle .)

So now is also of maximum modulus (since was) and so for every point in . Continue this and we will obtain . Since was an arbitrary point, it follows that for *every* . Hence, if attains a maximum modulus on the interior of some set , then it is constant. This implies directly that any non-constant analytic function achieves its maximum modulus on the boundary.