Theorem of the Day

 Home     Resources Centre     Gallery Shop      Visitors' Book

Notes
Supplementary notes for some of the listed theorems are provided below. Any suggestions for additions or corrections can be emailed to me and will be most welcome.

Theorem no. 1: The Four Colour Theorem
1. The original announcement (September 1976) by Appel and Haken of their proof is available on free access here courtesy of Project Euclid. The full publication followed a year later: Part I and Part II (there are also microfiche supplements).
2. A 1998 update on proving the 4CT (still computer-assisted) is given by Robin Thomas here.
3. The obvious progression from the sophisticated computer-assisted proofs of 4CT to formalised, computer-generated proofs, is discussed here.
4. Brendan McKay "A note on the history of the four-colour conjecture", Journal of Graph Theory, Vol. 72, No. 3, 2013, 361–363, has given the earliest publication date for the Four Colour Conjecture as 1854 (it was discussed in correspondence in 1852). A preprint is here.
5. There is a reference by Isabel Maddison to a "slightly different form" of the map-colouring question due to Möbius and his amateur mathematician friend Adoplh Weiske, publicised by Möbius in 1840 ("Note on the history of the map-coloring problem", Bull. Amer. Math. Soc., Volume 3, Number 7, 1897, page 257; online here). On p. 146 of Alexander Soifer, The Mathematical Coloring Book: Mathematics of Coloring and the Colorful Life of its Creators, Springer, 2009, you can find the details: a country is to be divided into 5 regions each bordering every other. Essentially, prove that $$K_5$$ is nonplanar (cannot be the graph of a map drawn in the plane), so perhaps more a precursor to the deeper Hadwiger's Conjecture than the four-colour conjecture.

Theorem no. 2: The Fundamental Theorem of the Calculus
1. There is a very nice overview of the history of calculus by Thony Christie here.
2. Making the accumulation function $$\int_0^x f(t)dt$$ of Part II of the theorem the starting point for explaining the whole Fundamental Theorem makes good pedagogical and aesthetic sense, as argued by McQuillan, D. and Olsen, D. M., "A Truly Beautiful Theorem: Demonstrating the Magnificence of the Fundamental Theorem of Calculus," Journal of Humanistic Mathematics, Volume 6 Issue 2, pages 148-160. Online here.
3. Part 1 and 2 of this theorem are not converse. Indeed a counterexample disproving the converse of Part 2 is provided by the always-insightful mathcounterexamples.net.
4. This theorem is the subject of Episode 1 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 3: The Bruck–Ryser–Chowla Theorem on Finite Projective Planes
1. How to draw finite projective planes in the Euclidean plane is somewhat a matter of taste or convenience. I have chosen in the theorem description to illustrate the order 2 plane with a 'broken' middle circle; it is often drawn with this circle completed. Neither is correct if you require that intersections of lines of the projective plane correspond to intersections of lines drawn in the Euclidean plane. The issue (thanks to Dr. Pravas K for drawing my attention to it) is discussed here.

Theorem no. 4: Euclid's Infinity of Primes
1. Our description of Euclid's theorem follows conventional practice in casting the proof as 'by contradiction'. One may take issue with this: see Michael Hardy and Catherine Woodgold, "Prime Simplicity", The Mathematical Intelligencer, December 2009, Volume 31, Issue 4, pp 44–52. A good overview of the issue is given here.
2. More poofs of this theorem can be found, encapsulated in an elegant contextual discussion, here.
3. Among the proofs to be found via (2), Fürstenberg’s 1955 topological proof has been given a more gentle exposition by Tai-Danae Bradley.
4. Among the proofs not found via (2) (it would have appeared too late, I think) is this lovely one-line proof by contradiction by Sam Northshield, with products taken over all primes $$p$$, supposedly finite in number and with $$P$$ denoting the product of all these primes: $$0<\prod_p \sin(\tau/2p)=\prod_p \sin(\tau/2p+\tau P/p)=0.$$ (As usual, $$\tau$$ denotes circumference of unit circle. More details are given by John D Cook here and Cut-The-Knot here, and by Northshield himself here.
5. The arxiv preprint by Chris Caldwell and Yeng Xiong cited on this theorem page was published as "What is the Smallest Prime?" Journal of Integer Sequences, Vol. 15 (2012), Article 12.9.7; online. (I've kept to the arxiv citation on the theorem page because it's a much shorter URL. It also has facsimiles of historical documents which are typeset in the published article: although the resulting pdf is much smaller this seems a pity). Another discussion of the role of 1 as a non-prime is here by Evelyn Lamb.
6. This theorem is the subject of Episode 22 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 5: The Chinese Remainder Theorem
1. John D. Cook gives a neat description of how our congruence system $$x\equiv y_i\pmod{n_i}$$ may be solved as $$y_1(N/n_1)^{\varphi(n_1)}+y_2(N/n_2)^{\varphi(n_2)}+\ldots + y_r(N/n_r)^{\varphi(n_r)},$$ where $$N=n_1n_2\cdots n_r$$ and $$\varphi$$ is Euler's totiant function. Thus our system with $$y_1=3, y_2=4,n_1=4, n_2=5$$ is solved by $$3\times 5^2+4\times 4^4=1099\pmod{20}=19.$$
2. A good online CRT solver is this by MathCelebrity.com which gives all the working and accepts negative number inputs.

Theorem no. 6: The Fundamental Theorem of Algebra
1. The current page describing this theorem replaces an older version using a quadratic polynomial, easier to assimilate perhaps but it seemed more revealing to have a cubic with both real and imaginary roots. The old version is archived here.
2. Daniel Litt gives a nice 'minimal' proof of the theorem.
3. Daniel J. Velleman offers "The Fundamental Theorem of Algebra: A Visual Approach", The Mathematical Intelligencer, December 2015, Volume 37, Issue 4, pp 12–21, preprint online here.
4. Paul Taylor has posted this English translation of "Gauss's second proof of the fundamental theorem of algebra" (thanks to John D. Cook for drawing my attention to this).
5. Featured in Math Scholar's thread Simple proofs of great theorems.

Theorem no. 7: The Fundamental Theorem of Arithmetic
1. This theorem description replaces an older version which had a more explicit illustration of walks defined by prime factorisations. The new version has a more sophisticated plot and the accompanying text sketches the proof of the theorem and has more on Goldbach. For those who prefer something more simple-minded (less crowded!) I have left the old version here.
2. Symbolically, this theorem asserts a unique (up to order) representation of a positive integer $$n$$ as a product of powers of primes: $$n=p_1^{a_1}p_2^{a_2}\cdots p_r^{a_r},$$ (with 1 being by convention the value of the empty product). This representation is implicit or explicit in many proofs in elementary number theory. It is also explicit in various calculations, e.g. in counting certain magic squares (Theorem 129) or in calculating periods of modular Fibonacci sequences (Theorem 235, notes(4)).
3. Although Goldbach does not imply that every point $$(2k,2)$$ will eventually appear on the walks plot illustrating this theorem, Andrzej Schinzel has shown ("Sur une conséquence de l'hypothèse de Goldbach", Bulgar. Akad. Nauk. Izv. Mat. lnst., 4 (1959) 35–38) that Goldbach does imply that every odd integer greater than 17 is a sum of three different primes, which would mean every point $$(2k+1,3),\,k>8$$, is plotted. Sierpinski on p. 124 of Elementary Theory of Numbers, Elsevier, 1988 adds "It follows from the results of Vinogradov that each sufficiently large odd number is such a sum". Of course H.A. Helfgott's proof of the Ternary Goldbach Conjecture confirms that every odd integer greater than 5 is a sum of three primes. It is not mentioned explicitly in Helfgott's preprint but I asked him and a result for three distinct primes is indeed implied.
4. Sierpinski's book, which is the recommended further reading for this page, is online at this blog (I'm not sure how legitimately but the book appears to be out of print).
5. A lovely implementation of FTA for positive integers up to 99 has been knitted by Sondra Eklund!

Theorem no. 8: The Central Limit Theorem
1. The recommended weblink from this theorem was previously this by Thayer Watkins, which is still good, but the applets are a bit problematical now.
2. Good modern animations of CLT can be found however: this by Michael Freeman, and this by Victor Powell.
3. Another version of CLT is explained very clearly at the Math Citadel and its hypotheses justified with a convincing counterexample.
4. John D. Cook gives an elegant summary of the historical origins of the normal distribution here.
5. There is a formal proof of the theorem, announced here by Jeremy Avigad, Johannes Hölzl, Luke Serafin.

Theorem no. 9: Fermat's Last Theorem
1. Wiles' proof of FLT occupied a complete issue of Annals of Mathematics: Wiles' "Modular elliptic curves and Fermat's Last Theorem", vol. 141, no. 3 (1995), pp. 443–551, accompanied by Richard Taylor and Andrew Wiles, "Ring-theoretic properties of certain Hecke algebras", pp. 553– 572. The papers don't seem to be readily available online, unfortunately.
2. The current page on FLT replaces an earlier one which had a more primitive graphic (without the benefit of Benoit Leturcq's fun Fermat sketch) and omitted the Fermat near-miss example). I have kept the old version here in case anyone prefers a less 'busy' page.
3. Fermat near-misses are based on lots of clever algebraic number theory: see this by Noam Elkies.
4. The reference in the illustration to probability theory is a nod to the fact that Fermat co-invented it. See this, for example, by Peter Lee.
5. A reference to "Molina's Urns" can be found, for example, in Frederick Mosteller, Fifty Challenging Problems in Probability with Solutions, Dover reprint, 2000 (problem 56 on p. 88).
6. Another elegant account (7.1MB pdf) of the resolution of FLT, which gives a little more on subsequent developments, can be found in this list of lectures by Karl Rubin.
7. The role of modular forms in the proof of FLT is made explicit in this presentation (7MB pdf) by Ken Ribet.

Theorem no. 10: Bayes' Theorem
1. You can view a Bayes' theorem prior as allowing the inclusion of numerical odds for subjective assumptions. I think a Bayesian would argue that not including these odds is to make an equally subjective assumption that prior knowledge is irrelevant. This is very well argued by Mike Lee and Benedict King in this Conversation article.
2. A nice retrospective on the history of use and abuse of Bayes' Theorem is provided by Bradley Efron in "Bayes’ Theorem in the Twenty-First Century", Science 340, June 7, 2013. A preprint is available here. I also like very much this Scientific American blog entry.
3. An investigation by Stephen M. Stigler, "Who Discovered Bayes's Theorem?", The American Statistician, 37 (4), 1983, 290–296 finds credible evidence that Bayes' Theorem was first discovered by Nicholas Saunderson. An online version is here and the article is reproduced in Stigler's book Statistics on the Table: The History of Statistical Concepts and Methods, Harvard University Press, paperback edition 2002.
4. A beautiful animated illustration of conditional probability by Victor Powell is here, while Will Kurt's Bayesian blog Count Bayesie has a nice alternative to our cow counting illustration here.

Theorem no. 12: The Matrix Tree Theorem
1. The reliability calculation here, in general, is asking what is the probability that deleting $$e-n+1$$ edges uniformly at random will result in a spanning tree. For a plane graph with $$e$$ edges, $$n$$ vertices and $$f$$ faces, and having $$t$$ spanning trees, the calculation becomes $$t(f-1)!(n-1)!/e!$$, which neatly shows that the probability is identical for the dual graph.
2. The weblink for this page proves MTT using the Binet–Cauchy theorem from matrix theory. A standard combinatorial approach uses induction based on deletion-contraction as in these notes by David P. Williamson. A direct combinatorial proof by Doron Zeilberger is given in section 4 of "A combinatorial approach to matrix algebra", Discrete Mathematics, Vol. 56, Issue 1, 1985, pages 61–72; online. The proof of Seth Chaiken and Daniel J. Kleitman given in "Matrix Tree Theorems", Journal of Combinatorial Theory, Series A, Vol. 24, Issue 3, May 1978, pp 377–381 is also of interest; online. A nice random walk proof is given by Michael J. Kozdron here, invoking the algorithm of David Wilson for selecting a spanning tree uniformly at random. Gil Kalai has a nice overview here.
3. See Garrys Tee's article in vol. 30 (3.9MB pdf) of Image for more on the history and applications of determinants.

Theorem no. 13: Fermat's Little Theorem
1. A good source on Guiga's conjecture is D. Borwein, J.M. Borwein, P.B. Borwein, R. Girgensohn, "Guiga's conjecture on primality", American Mathematical Monthly, vol. 103, 1996, pp 40–50; online. The lower bounds for counterexamples (>4771 prime factors, > 19908 digits) are from this presentation (4.4MB pdf file).
2. The proof of Fermat's Little Theorem given in the description here is due to James Ivory, "Demonstration of a theorem respecting prime numbers", New series of The Mathematical Depository, 1 (II),1806, pp 6–8. You appear to be able to access all of this volume free online from google books. The proof is given in slightly more detail by cut-the-knot, which is where I took my version from.
3. Euler's important generalisation of Fermat's theorem should be recorded here. Euler's totient function $$\phi(n)$$, for $$n$$ a positive integer, is the number of positive integers less than $$n$$ and coprime to $$n$$. Now for $$m$$ a positive integer and $$a$$ any integer coprime to $$m$$ we have $$a^{\phi(m)}=1 \mbox{ mod } m$$. Art of Problem Solving give a proof. For example, $$10^{\phi(9)}=10^6$$ which of course has remainder $$1$$ on division by $$9$$. For prime $$p$$ we have $$\phi(p)=p-1$$ so that Fermat's theorem is an immediate corollary. Euler's first proof of Fermat's theorem was published in 1736. He published several other proofs, culminating in 1763 with this generalisation.
4. This theorem is the subject of Episode 4 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 14: Cook's Theorem
1. An English translation of Leonid Levin's paper, together with a thorough analysis, may be found here. Cook's paper has been TEXed into pdf by Tim Rohlfs, available here.
2. I provide a little more (amateur) analysis of this theorem as an example of a 'simultaneity' in mathematics here.
3. Regarding P=NP? Gerhard Woeginger provides a valuable 'clearing house' for proof/disproof attempts.
4. Although it is widely believed that P≠NP there are a few prominent sceptics, for example Don Knuth (see Q. 17 here) and R.J. Lipton (see this). Bill Gasarch has conducted three surveys over nearly 20 years regarding peoples beliefs about P=NP. The latest (and links to previous) is described here.
5. The name 'NP-complete' was invented in response to a poll run by Knuth in 1973, as described in this entry in Lance Fortnow's blog

Theorem no. 15: The Orbit Counting Lemma
1. The current version of this page replaces an earlier version which concentrated on depicting permutations and the idea of fixing a point. This new version offers a basic illustration of how the lemma applies in counting, an illustration which is continued in the page for the Pólya-Redfield Enumeration Theorem. The old version of this page is retained here.
2. A probably definitive account of the history of the Orbit Counting Lemma is given by Peter M. Neumann in "A lemma that is not Burnside's", The Mathematical Scientist, Vol. 4, Issue 2, July 1979, pp. 133–141; online.
3. This theorem is the subject of Episode 10 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 16: Sperner's Lemma
1. My original weblink from this theorem page was to the Sperner entry at cut-the-knot. I prefer, because of java applet browser issues, to relocate this link to my notes page. I hope that at some future date I can re-instate it because some clever and altruistic people have honoured Alexander Bogomolny by giving cut-the-knot a new lease of life!
2. Sperner's Lemma provides an elementary addition to Maryam Mirzakhani's legacy.
3. This theorem is the subject of Episode 27 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 17: The Well-Ordering Theorem
1. As the remark on the Banach–Tarski Paradox suggests, it is infinity, rather than well-ordering or the Axiom of Choice, that can defy intuition. This 'anti-anti-Banarch–Tarski' argument is very well made by Asaf Karagila (follow the link back from the wonderful cartoon; I found this originally at Boole's Rings).

Theorem no. 18: Brouwer's Fixed Point Theorem
1. A very nice discussion by Phil Wilson of Brouwer and constructivist mathematics is given here by plus magazine.
2. This theorem is the subject of Episode 20 and Episode 25 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.
3. The contention that there is no constructive proof of 'BFPT' goes very deep. See this at mathoverflow, for example.

Theorem no. 19: Dilworth's Theorem
1. The current illustraion of this theorem replaces one based on snooker balls which was less informative but to which I may as well retain a link.

Theorem no. 21: Brun's Theorem
1. Brun's $$cx/(\ln x)^2$$ bound on the twin prime count is very well motivated by James Maynard in his PROMYS Europe 2015 lecture "Patterns in the Primes" which can be found here (last time I checked, Firefox complained about the security of the pdf download, so proceed with caution).
2. Brun's sieve received less attention in the early 1900s than it perhaps deserved. In the introduction to George Greaves, Sieves in Number Theory, we read that "The mathematical community did not immediately give Brun's results the recognition they later received. E. Landau left Brun's 1920 paper unread for a decade, apparently because he was not predisposed to believe that elementary methods as used by Brun could penetrate problems such as Goldbach's to the asserted extent". And in the introduction to Heini Halberstam and Hans-Egon Richert's classic, Sieve Methods, we read "its complicated structure and Brun's own early accounts tended to discourage closer study", and that Landau, writing an account of the method, commented "Myself I have never bothered to thoroughly work through Mr. Brun's original work" (google's translation).
3. Euler proved the divergence of the prime reciprocals in 1737, the climax of the paper which introduced the product formula for the zeta function (see theorem 246).

Theorem no. 22: Cantor's Uncountability Theorem
1. There is an interesting discussion about the earliest apparence of the diagonal argument in Cantor's work here at mathoverflow.net.
2. This theorem is the subject of Episode 34 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 23: Cantor's Theorem
1. The popular story has it that thinking about infinity, and attacks by others (notably Kronecker) on his thinking about infinity, drove Cantor insane. A much less sensational view is given in this blog entry by Richard Zach and this by Thony Christie.

Theorem no. 24: Kuratowski's Theorem
1. The multiple discoveries and proofs of this theorem are wonderfully charted in John W Kennedy, Louis V Quintas, Maciej M Sysłois, "The theorem on planar graphs", Historia Mathematica, Vol. 12, No. 4, 1985, 356–368, available here via Elsevier's Open Access Archive. The abstract of Frink and Smith's unpublished proof appears here (Bull. AMS, 36, 3) under 'Abstracts of papers'.
2. Bill Tutte, under the Blache Descartes pseudonymn wrote a little poem about the non-planarity of $$K_{33}$$ which may be read here.

Theorem no. 27: The Pythagorean Theorem
1. It would seem that even 'Pythagorean' is a misnomer for this theorem. Piers Bursill-Hall gave me the following valuable pointers:
 "work by David Fowler (U.Warwick) and Wilbur Knorr (U.Stanford) more than a couple of decades ago demonstrated very convincingly what had been at least implied since German philologists and classicists in the late 19th century debunked most of the mythology around Pythagoras. A good place to start would be the references and footnotes in David Fowler's The Mathematics of Plato's Academy: A New Reconstruction or Knorr's The Evolution of the Euclidean Elements: A Study of the Theory of Incommensurable Magnitudes and Its Significance for Early Greek Geometry. It ought not to be a controversial, surprising, or new item to mathematicians ... although sadly that is not the case."
2. A nice summary of the theorem's history is provided by Manjul Bharagava here.
3. The theorem is equivalent to Euclid's notorious 5th axiom, the Parallel Postulate, in the sense that each may be derived directly from the other, a fact which seems to date back to Legendre. More details here. It is also equivalent to Heron's formula (see Theorem no. 76) as revealed in a beautiful article by Vaughan Pratt, "Factoring Heron," The College Mathematics Journal, Vol. 40, no. 1, January 2009, pp. 15–16; online.
4. There is a beautiful account by Steven Strogatz of a particularly elegant proof of the Pythagorean Theorem attributable to and, Strogatz argues, bearing all the hallmarks of, Albert Einstein.
5. Jan Stevens provides an interesting commentary on Dijkstra's account of Pythagoras, online here.
6. A wonderful java applet proof of the theorem has been created by Jim Morey. It is online here but nowadays the chances are you will struggle to make your browser run it.
7. This theorem is the subject of Episode 7 and Episode 39 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 28: Ramsey's Theorem
1. An account of Erdős and Ramsey theory is given by Ronald L. Graham and Joel Spencer in the centennial reflections here (from p. 132).
2. A very interesting account of early precursors to Ramsey's theorem is offered by the Computational Complexity blog. This is also the focus of Alexander Soifer, The Mathematical Coloring Book: Mathematics of Coloring and the Colorful Life of its Creators, Springer, 2008.
3. The latest bounds (as of March 2017) for $$R(5)$$ are due to Vigleik Angeltveit and  Brendan D. McKay.
4. More on the famous Erdős quote by Evelyn Lamb.
5. This theorem is the subject of Episode 31 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 29: Gauss's Law of Quadratic Reciprocity
1. Özlem Imamoğlu's review, Bull. Amer. Math. Soc., Vol. 44, No. 4, 2007, 647–652, online, of Franz Lemmermayer's Reciprocity Laws: From Euler to Eisenstein, Springer, 2000, is a superb mini-essay on reciprocity beyond quadratic.

Theorem no. 30: The Law of Large Numbers
1. The theorem is described here in elementary terms, as would have been understood by Laplace himself. An excellent modern account in terms of measure theory is given by Terence Tao here.
2. I cannot resist linking to "The Law of Small Numbers", Jonathan Kujawa's elegant centenary homage to Richard Guy in 3quarksdaily.

Theorem no. 31: Benford's Law
1. More detail on use of Benford by the IRS in the United States is given here. The Wikipedia entry has a good entry on forensic aspects of Benford. Another compelling forensic application is Daniel Gamermann and Felipe Leite Antunes, "Evidence of Fraud in Brazil's Electoral Campaigns Via the Benford's Law", online.
2. An interesting occurrence of Benford is in the frequencies of leading digits in base 10 representations of powers, e.g. $$2^k,k=0,1,\ldots$$. See Theorem 299 (notes(5)) and also "A simple answer to Gelfand’s Question" by Jaap Eising, David Radcliffe and Jaap Top, The American Mathematical Monthly, March 2015, 234–245, online here. Tangentially, we can ask whether prime numbers obey Benford.
3. Ted Hill has written a very nice blog entry on Benford for Princeton University Press who publish his book with Arno Berger.

Theorem no. 32: The Green–Tao Theorem on Primes in Arithmetic Progression
1. Green and Tao's achievement is described by Bryna Kra as "an amazing fusion of methods from analytic number theory and ergodic theory" in his technical overview of their proof "The Green-Tao Theorem on arithmetic progressions in the primes: an ergodic point of view", Bull. Amer. Math. Soc. 43 (2006), 3–23; online. There is a nice overview by Ben Green here (p. 10, 11MB pdf file). Tao has collected some survey-type presentations at various levels here.

Theorem no. 33: The Prime Number Theorem
1. See also Benjamin Fine and Gerhard Rosenberger, "An Epic Drama: The Development of the Prime Number Theorem", Scientia Series A: Mathematical Sciences, Vol. 20 (2010), 1–26. This is supposed to be online here but the link didn't work when I last tried. There is a pdf download here.
2. A fine general historical account of the prime number theorem by Tom M. Apostol, "A centennial history of the prime number theorem", Engineering and Science, No. 4, 1996, pp. 19–28, is online here (3.4MB pdf). The classic account by Don Zagier of "Newman's short proof of the prime number theorem", American Mathematical Monthly, vol. 104 (1997), pp. 705–708, is online here.
3. There is a brief discussion of heuristic explanations for the Prime Number Theorem (notably the one by Greg Martin) here.
4. If $$(\bar{x},\bar{y})$$ is the centre of mass of the arc of $$y=\log(x)$$ in the interval $$[1,x]$$ then $$\frac12\pi(x)$$ is asymptotic to $$\bar{x}/\bar{y}$$ (see M500 magazine, issue 260, pp. 10–12).
5. Regarding the famous 'elementary proof' of PNT see Norman Levinson's "A motivated account of an elementary proof of the prime number theorem", American Mathematical Monthly, vol. 76, 1969, pp. 225–245; online. The unfortunate associated priority dispute is meticulously documented by Dorien Goldfeld in "The elementary proof of the prime number theorem: an historical perspective", in David Chudnovsky, Gregory Chudnovsky and Melvyn Nathanson (eds.), Number Theory New York Seminar 2003, Springer, 2004; online via Goldfeld's webpage (under preprints). It includes the observation that Tchebychef had given an elementary proof in 1852 that $$x/\log x$$ is the correct order of magnitude for $$\pi(x)$$. Both the proof and dispute are given a non-technical overview by Joel Spencer and Ronald Graham, "The Elementary proof of the prime number theorem", Mathematical Intelligencer, vol. 31 (3), June 2009, 18–23.
6. There are formal proofs of PNT:
1. the Erdős–Selberg elementary proof: Jeremy Avigad, Kevin Donnelly, David Gray, and Paul Raffand, "A formally verified proof of the prime number theorem", ACM Transactions on Computational Logic, Vol. 9 Issue 1, December 2007; online preprint (and see here for a nice overview presentation by Avigad) and
2. the classical complex analysis proof: John Harrison, "Formalizing an analytic proof of the prime number theorem", Journal of Automated Reasoning, vol. 43, pp. 243–261, 2009; online.
7. Chance News have a series of lectures on probabilistic number theory, with part 1 offering an excellent exploration of the Prime Number Theorem.

Theorem no. 35: The Second Isomorphism Theorem
1. This page replaces an earlier version combining the 2nd and 3rd isomorphism theorems with an illustration based on their superficial similarity to rules for arithmetic with fractions. I've left a copy here (opens in new tab). The 3rd isomorphism theorem is now presented seperately as Theorem no. 253.
2. The example given is a special case of the application given by P.M. Cohn in Algebra, Volume 1 (which I assume is carried over to Classic Algebra although I haven't checked): if a subgroup $$H$$ of $$\mbox{Sym}_n$$ has any odd permutations then its even permutations form a normal subgroup of index 2 in $$H$$.
3. Another depiction of the Cayley table of Frobenius 20, together with much other valuable information is given here by the beguiling website escarbille.free.fr.

Theorem no. 36: Euler's Identity
1. The symbol used in the illustration of this theorem is on loan from Michael Hartl, with thanks.

Theorem no. 37: Girard's Theorem
1. Fix the area $$T$$ of a spherical triangle and invert Girard's formula to give $$A+B+C=T/r^2+\tau/2$$. Now let radius $$r$$ tend to infinity: we recover a triangle in the Euclidean plane whose angles sum to $$\tau/2$$, as expected.
2. Attribution of this theorem to Harriot can be found in chapter 2 of Roger Penrose, The Road to Reality: A Complete Guide to the Laws of the Universe, Vintage, 2005; and in chapter 10 of David S. Richeson, Euler's Gem: The Polyhedron Formula and the Birth of Topology, Princeton University Press, 2008. I have seen it given to Legendre (e.g.) but Legendre's result, published in 1798, much later than Girard, approximates the difference between angles in a spherical triangle and angles in a plane triangle having the same side lengths. A good account is here. (Legendre did not, in any case, claim the result as his.)
3. My original web link from this theorem page was to these fine notes of John C. Polking. They are still fine, of course, but in many browsers the animations no longer work, or not without some trouble. So I have moved the link here and from the theorem page link to a Polking-based page with a nice animation which appears to be browser-proof.

Theorem no. 38: Lucas' Theorem
1. Romeo Meštrović has compiled a fine survey of applications and extensions of Lucas's theorem.
2. A generalisation of Lucas's theorem is given by Andrew Granville in his dynamic e-survey Arithmetic properties of Binomial Coefficients.

Theorem no. 39: Pascal's Rule
1. There is one version of Pascal's triangle which is indeed triangular: where the entries are reduced modulo 2. In this case the pattern which emerges is a version of Sierpinski's gasket!
2. Pascal's triangle as it is usually displayed has sides which are parabolic, that is, quadratic in $$n$$. The easiest way to confirm this is perhaps to estimate the sum of the digits in the $$n$$-th row using the normal curve approximation. This gives $${n\choose k}\approx\dfrac{2^{n+1}}{\sqrt{\tau n}}e^{-(2k-n)^2/2n}$$. Taking logs and summing over $$k$$ gives highest terms of order $$n^2$$.
3. Amazingly a simple relationship between Pascal's triangle and $$e=2.71828\ldots$$ appears to have been noticed for the first time, by Harlan J. Brothers, only in the twenty-first century.

Theorem no. 40: Stirling's Approximation
1. Aa very nice introduction to Stirling's approximation is Finbarr Holland, "A leisurely elementary treatment of Stirling’s formula", Irish Mathematical Society Bulletin, 77, Summer 2016, pp. 35–44; online.
2. A good source of information on the central binomial coefficients is the corresponding entry at oeis.org, where the sequence is no. 984.

Theorem no. 42: Zeckendorf's Theorem
1. It would appear that Zeckendorf's theorem was first published in C. G. Lekkerkerker, "Voorstelling van natuurlijke getallen door een som van getallen van Fibonacci", Simon Stevin, 29 (1951-1952), 190–195. I have not seen this paper but Daykin in a 1960 paper (see note 3 below) says "Zeckendorf's proof is given by C. G. Lekkerkerka (sic) in [1]," ([1] being the Simon Stevin paper). Zeckendorf himself published his theorem in 1972 in "Representation des nombres naturels par une somme de nombres de Fibonacci ou de nombres de Lucas", Bull. Soc. Royale Sci. Liege 41 (1972) 179–182. I haven't seen this paper either (the Bulletin de la Société Royale des Sciences de Liège is online but not all issues appear to be digitised).
2. A good presentation of Zeckendorf's and Lekkerkerker's theorem is given here by Steven J. Miller. Another good source on Lekkerkerker's theorem is Jukka Pihko, "On Fibonacci and Lucas representations and a theorem of Lekkerkerker", Fibonacci Quarterly, vol. 23, no. 3 (1988), 256–261, online here.
3. The exact value of the average number of Zeckendorf summands, over the interval $$[F_{n+1},F_{n+2})$$, as approached asymptotically by Lekkerkerker's theorem, is $$L_n=1+\varepsilon(n)/F_n$$, where $$\varepsilon(n)=\sum_{k=0}^{\lfloor(n-1)/2\rfloor}k{n-1-k\choose k}$$ (see Steven J. Miller's presentation). This apparently allows us to write, via the identity $$\varphi^2=1+\varphi$$, the error in Lekkerkerker's ratio as the constant $$3/5$$, thus: $$\lim_{n\rightarrow\infty}\left(L_n-2n/(5+\sqrt{5}\,)\right)=3/5$$. The function $$\varepsilon(n)$$ can itself be written entiredly in terms of Fibonacci numbers: $$\varepsilon(n)=\left\{\begin{array}{ccl} \left(\frac{n}{2}-\frac25\right)F_n-\frac{n}{10}\left(F_{n-1}+F_{n+1}\right) & &n \mbox{ even}\\ -\frac15F_{n-1}+\frac{n-1}{5}(F_{n-2}+F_n)&&n \mbox{ odd}, \end{array}\right.$$ alternatively, $$\varepsilon(n)=\left\{\begin{array}{ccl} \left(\frac{n}{2}-\frac25\right)F_n-\frac{n}{10}F_{2n}/F_{n} & &n \mbox{ even}\\ -\frac15F_{n-1}+\frac{n-1}{5}F_{2n-2}/F_{n-1}&&n \mbox{ odd},\end{array}\right.$$ the ratio $$F_{2T}/F_T$$ also being the $$T$$-th Lucas number (A000032).
4. David Daykin's paper is "Representation of Natural Numbers as Sums of Generalised Fibonacci Numbers", J. London Math. Soc., (1960) s1-35 (2): 143-160. The first page is free-access here. A follow-up paper by Daykin appeared in Fibonacci Quarterly in 1969 and can be viewed here. A brief account of Daykin's uniqueness result is given in J. L. Brown, Jr., "Zeckendorf's theorem and some applications", Fibonacci Quarterly, vol. 2, no. 3, 1964, 163–168; online here.
5. Garry J. Tee has contributed some interesting remarks on Zeckendorf-based arithmetic, which he investigated in "Russian Peasant Multiplication and Egyptian Division in Zeckendorf Arithmetic", Australian Mathematical Society Gazette, vol. 30, no. 5, 2003, 267–276:
 "My algorithms for arithmetic in Zeckendorf arithmetic are much more efficient than any published previously, but they still cost very  much more than binary arithmetic. I commented that, if more efficient algorithms for Zeckendorf addition and subtraction could be devised, then they could be used to give much more efficient algorithms for Zeckendorf multiplication and division. The 2013 paper  by Conner Ahlbach, Jeremy Usatine, Christiane Frougny and Nicholas Pippenger on Efficient Algorithms for Zeckendorf Arithmetic, Fibonacci Quarterly, 51(3):249–255  does give much more efficient algorithms for Zeckendorf addition and subtraction - but their main emphasis is on the depth of circuitry required."
The paper can be viewed in pdf form (≈1.6MB) here (with kind permission of the Australian Mathematical Society).
6. The universality of the Fibonacci sequence, restricted to just 1,1,2,3,5, is exploited in a clever clock by Philippe Chrétien, described here by Alex Bellos.
7. There is a less sophisticated version of Colm Mulcahy's card trick here by Kiran Ananthpur Bacche.

Theorem no. 43: The DPRM Theorem
1. Martin D. Davis, "Hilbert's tenth problem is unsolvable", The American Mathematical Monthly, vol. 80, (1973), pp. 233–269; online; is an excellent account of the proof of DPRM.
2. A particularly charming and accessible example of a Diophantine set is the set of Fibonacci numbers: James P. Jones, "Diophantine Representation of the Fibonacci Numbers", Fibonacci Quarterly, Feb. 1975, 84–88; online here (3rd from bottom). Jones is also one of the team who produced a particularly compelling prime polynomial, via the fact that the set pf primes is Diophantine: Douglas Wiens, James P. Jones, Daihachiro Sato, Hideo Wada, "Diophantine representation of the set of prime numbers", The American Mathematical Monthly, vol. 83, 1976, pp. 449-464; online.

Theorem no. 44: Pappus's Theorem
1. Same comment as for Theorem 55 regarding Java; the app for this theorem, if you want to try, is here.
2. See note (5) for Theorem 215 regarding Pappus's theorem becoming involved in twentieth century mathematics

Theorem no. 45: Binet's Formula
1. The 'square spiral' illustrating this formula is the basis for a lovely curve construction discovered by Edmund Harriss and described here.

Theorem no. 46: Cameron's Theorem on Distance-Transitive Graphs
1. Cameron's proof of this theorem depends on his proof, with Preager, Saxl and Seitz, of the Sims Conjecture which in turn relies on the Classification of the Finite Simple Groups. A CFSG-free proof was suggested by Cameron and completed by Richard Weiss in 1985. Weiss's paper can be viewed online here. The paper proving the Sims Conjecture is cited in (Theorem 65, notes(1))
2. Questions regarding infinite distance-transitive graphs are generally open, except in the case of locally finite case which was solved by H. Dugald Mcpherson in 1982. This work and progress on the non-locally finite case, as of 1998, is described by Cameron in "A census of infinite distance-transitive graphs", Discrete Mathematics, Volume 192, Issues 1–3, 1998, pp 11–26; online here.
3. Our illustration of this theorem shows eight of the twelve 3-regular distance transitive graphs. The full list is given in the relevant wikipedia entry, with pictures of each graph.

Theorem no. 47: The Binomial Theorem
1. The definition of binomial coefficients to allow for arbitrary complex powers of the binomial can be generalised still further to allow both parameters to be complex, as explained here by John D. Cook. (This does not impact on the Binomial Theorem whose statement only features the 'top' parameter.)
2. The original weblink for this theorem was an absorbing paper by Lawrence Neff Stout (1948–2012): "Aesthetic Analysis of Proofs of the Binomial Theorem" which was slated for but does not appear in, The Humanistic Mathematics Network Journal. It is available at academia.edu and I have uploaded a temporary copy here for quick reference.

Theorem no. 48: Beineke's Theorem on Line Graphs
1. Beineke announced this result in 1968 (he referred to line graphs as 'derived graphs'). A proof appeared in 1970 in Journal of Combinatorial Theory; the article can be viewed online here via Elsevier's Open Access Archive.
2. The attribution to N. (presumably Neil) Robertson is found on p. 74 of Frank Harary, Graph Theory, Westview Press, new edition, 1994.
3. The number of forbidden subgraphs for characterising line graphs can be lowered under slightly stronger conditions. Thus Ľubomír Šoltés proved in 1994 that, for a graph on at least 9 vertices, forbidden subgraphs V and IX can be ignored, bringing the number down to 7. Then in 2001, with Hong-Jian Lai, he proved that just forbidden subgraphs I, VI and VIII are enough, provided that the graph being tested has minimum degree 7 and is not isomophic to two complete graphs sharing an edge (e.g. subgraph VII). See "Line Graphs and Forbidden Induced Subgraphs", Journal of Combinatorial Theory, Series B, 82, 38–55 (2001). An online copy is available on Lai's website.

Theorem no. 49: Netto's Conjecture (Dixon's Theorem)
1. An earlier version of this page which explained permutation multiplication, rather than depicting the distribution of pairs generating $$S_n$$ and $$A_n$$, has been preserved here.
2. A good overview of probabilistic group theory by John Dixon is given in this 2004 preprint. Another is this pdf presentation by Elisa Covato.
3. Sharper bounds on the probability of generating $$S_n$$ and $$A_n$$ are given in Attila Maróti and M. Chiara Tamburini, "Bounds for the probability of generating the symmetric and alternating groups", Archiv der Mathematik, Vol. 96, Issue 2, 2011, pp 115–121; online (paywalled); free-access preprint. These have been improved in Luke Morgan and Colva M. Roney-Dougal, "A note on the probability of generating alternating or symmetric groups", Archiv der Mathematik, Vol. 105, Issue 3, 2015, pp 201–204; online (paywalled); free-access preprint.
4. The plots on this page use the elements of $$S_n$$ ordered by parity (even permutations before odd) then by number of moved points, then by value of first moved point. For $$S_4$$, for example, this gives the listing $$(),(1 2 3),(1 2 4),(1 3 2),(1 3 4),(1 4 2),(1 4 3),(2 3 4),(2 4 3),(1 2)(3 4),(1 3)(2 4),(1 4)(2 3),(1 2),(1 3),(1 4),(2 3),(2 4),(3 4),(1 2 3 4),(1 2 4 3),(1 3 4 2), (1 3 2 4),(1 4 3 2),(1 4 2 3)$$.
5. The numbers of permutation pairs generating $$S_n$$ is sequence A071605 at oeis.org.

Theorem no. 50: The Euler–Hierholzer "Bridges of Königsberg" Theorem
1. An account of this theorem is to be found on page 33ff of the 2nd edition of Edouard Lucas's Récréations Mathématiques, vol. 1, published in 1891, the year of Lucas's tragically early death. The sufficiency of all degrees even is dealt with in a note on page 223 and is, according to this Wikipedia entry, essentially Hierholzer's 1873 argument. Lucas lists in his bibliography the article of Fleury giving an alternative method of construction. This article is a response to Lucas's Récréations Mathématiques entry and proposes a first solution to the construction problem in the version of drawing a figure in one continuous line. I have not seen Lucas's 1st edition to see if his entry differs in the light of Fleury's article. You can see Lucas's 2nd edition online here and Fleury's article is online here (p.257ff).
2. There is a nice real-life application to snow ploughing described here (which is actually the Chinese Postman problem but Euler tours are the background).
3. And the traditional puzzle is solved, with very nice graphics, for New York here.

Theorem no. 52: The Robertson--Seymour Graph Minors Theorem
1. There is a very nice semi-technical overview of 'Robertson-Seymour' theory by Lovász here. Graph minors theory is also very well explained in a series of blog posts by Jim Belk.

Theorem no. 53: Bailey's Theorem on Latin Squares
1. The original source for this theorem is R.A. Bailey, "Quasi-Complete Latin Squares: Construction and Randomization", Journal of the Royal Statistical Society. Series B (Methodological) Vol. 46, No. 2 (1984), pp. 323–334; online (paywalled).
2. Bailey's conjecture on terraces has been shown by Matt Ollis to hold for groups of all orders up to 511 with the possible exception of 256 and 384. Ollis also has this on the conjecture for abelian groups: "On terraces for abelian groups", Discrete Mathematics, Vol. 305, Issues 1–3, 6 December 2005, pp 250–263; online.
3. The original web link for this theorem description was this entry at the Encyclopedia of Design Theory which remains a valuable resource but now hosted, and perhaps ephemerally, by Leonard Soicher at Queen Mary University of London, the domain designtheory.org having ceased to exist.

Theorem no. 55: Miquel's Triangle Theorem
1. I used to link to the geometry app that created this theorem's illustration. But guaranteeing that a Java app will work for everyone's favourite browser has become too painful. And the app wasn't all that exciting anyway, so it is no longer maintained. But you can look at it here — last time I checked it worked in Internet Explorer.
2. Some brief biographical details are given by Jean-Louis Ayme here.

Theorem no. 56: Morley's Miracle
1. Same comment as for Theorem 55 regarding Java; the app for this theorem, if you want to try, is here.
2. The John Mullan quote about Faber and Faber is online here (paragraph 6). The link to Morley's mathematics is tenuous but there is more on Morley junior at Faber and Faber here.

Theorem no. 58: Galois' Theorem on Finite Fields
1. The recommended weblink for this page was previously some notes by Peter Cameron on finite fields, posted at designtheory.org. This website currently (Auguest 2018) does not exist and its content is being hosted by Leonard Soicher at Queen Mary University of London (which is the link given in the previous sentence). The notes on finite fields are thus still available but I have preferred to use a more permanent link on the theorem page itself.
2. It would be ahistorical to say something like "Galois showed constructively that finite fields have prime power order; and Eliakim Moore showed that this construction was the only one possible". Moore, a pioneer of abstract algebra proved the structure theorem which says "this is what finite fields look like". But this answers a question which would not have occurred to Galois! A detailed study of the work which led up to and beyond Moore is given by Frederic Brechenmacher in "A history of Galois fields", 2012, <hal-00786662>. The definitive analysis of the Galois archive is of course Peter M. Neumann's The Mathematical Writings of Evariste Galois, European Mathematical Society, 2011.
3. There is a wonderful knitted GF(16) by the woolythoughts blog.

Theorem no. 59: Germain's Theorem
1. A detailed account of Germain's work on Fermat's Last Theorem has been given by Roger Thompson in the October and December 2016 issues of M500 magazine (pdf 600KB downloads).
2. Although Germain's Theorem does not play a direct part in the eventual proof of FLT (Theorem 9) it has deep connections with other parts of number theory which are still of interest. For instance, it is connected with period lengths of modular Fibonacci sequences via 'Wall's Question' (see Theorem 235, notes(3)). There is also a connection to cryptography via the idea of 'strong' and 'safe' primes. See this, for example.
3. The number and distribution of Germain primes are of interest in their own right, see here, for example.
4. There is an interesting side story on the Germain–Gauss correspondence in this prize-winning essay by William C. Waterhouse. More on Gauss and Germain and on her work in number theory can be found in Raymond Flood's Gresham College lecture on the subject. Evelyn J. Lamb has this evocative piece which also has some useful onward links.
5. There is a nice characterisation of Germain primes by Paolo Leonetti.

Theorem no. 60: The Strong Perfect Graph Theorem
1. The qualifier 'strong' distinguishes this theorem from the Perfect Graph Theorem, also conjectured by Claude Berge and proved by László Lovász in 1972.
2. Maria Chudnovsky, Neil Robertson, Paul Seymour, Robin Thomas, "The strong perfect graph theorem", Annals of Mathematics, Vol. 164, Issue 1, 2006, 51–229. Online version.
3. The 2001 Strong Perfect Graph Theorem for square-free graphs (no induced 4-cycles) of Michele Conforti, Gérard Cornuéjols and Kristina Vušković is proved in "Square-Free Perfect Graphs", Journal of Combinatorial Theory B, 90 (2004) 257–307. A preprint can be found on Cornuéjols website here (Research Papers, section 2).

Theorem no. 61: Moufang's Theorem
1. For a proof of Moufang's theorem see Aleš Drápal, "A simplified proof of Moufang's theorem", Proc. AMS, Vol. 139, No. 1, 2011, 93–98; online.
2. The smallest non-associative Moufang loop has order 9 — see Theorem no. 114; the octonions form a loop of order 16 (the negated elements are omitted from our multiplication table for conciseness) and there are four other non-associative Moufang loops of this order, see the paper by Orin Chein here. The sequence of numbers of non-associative Moufang loops is A090750.
3. The question of whether non-Moufang loops may still obey Moufang's Theorem is addressed here by Izabella Stuhl.

Theorem no. 63: Thales' Theorem
1. The story of Thales sacrificing an ox is apocryphal. It has also been attached to Pythagoras. See here for more details.

Theorem no. 64: Neumann's Separation Lemma
1. Peter Neumann published his lemma in "The structure of finitary permutation groups", Archiv der Mathematik, Vol. 27, Issue 1, 1976, 3–17, appearing in December. The extension to finite permutation groups beat it into print, appearing in October: B.J. Birch, R.G. Burns, Sheila Oates Macdonald and Peter M. Neumann, "On the orbit-sizes of permutation groups containing elements separating finite subsets", Bull. Austral. Math. Soc., Vol. 14, Issue 1, 1976, 7–10; online here. I think Peter Neumann must have told me that both results were discovered in 1974.

Theorem no. 65: Sims' Conjecture
1. This theorem was published as "On the Sims Conjecture and distance transitive graphs", Bull. London Math. Soc, 15 (1983), 499–506. A pdf copy can be found here.
2. The theorem is very well placed in context by this short article by Cheryl Preager in Asia Pactific Mathematics Magazine.
3. See notes to Theorem 46 for an important application of Sims' conjecture.

Theorem no. 66: An Erdős–Ko–Rado Theorem on Intersecting Permutations
1. The woolythoughts blog offers an alternative view of $$S_4$$. They have also done $$S_5$$.

Theorem no. 67: Reidemeister's Theorem
1. The independent discovery of this theorem by J.W. Alexander and G.B. Briggs ("On types of knotted curves", Annals of Mathematics, Vol. 28, No. 1/4, 1926–1927, 562–586) may be read here.

Theorem no. 68: The Stable Marriage Theorem
1. Closely related is the economist Thomas Schelling's work on segregation and diversity. A compelling illustration of these ideas is provided in animated form by The Parable of the Polygons by Vi Hart and Nicky Case.
2. An excellent account of Lloyd Shapley and his work is given here by Joseph Malkevitch.

Theorem no. 69: Arrow's Impossibility Theorem
1. Arrow's theorem as a result in social science belongs to the general area of social choice theory. A good account from this perspective is given here (Snapshot no. 9/2015) by Victoria Power (also available in German).
2. And something less serious!

Theorem no. 71: Sharkovsky's Theorem
1. @jakebarrrett has alerted me to the fact that a number of elementary proofs of this theorem are available. See, e.g., Chapter 1 of Louis Block & William Coppel, Dynamics in One Dimension, Springer, 2009. Also some notable treatments by Bau-Sen Du which can be accessed online via his arxiv listing.
2. The crucial (but weaker) result that period 3 implies periods of all orders is discovered independently by Tien-Yien Li and James A. York, "Period Three Implies Chaos", The American Mathematical Monthly, Vol. 82, No. 10 (Dec., 1975), pp. 985–992; online (paywalled); there is a good description of their theorem by Padraic Bartlett here. Sharkovsky published his result in Russian at the height of the war and it did not receive widespread attention until the 1970s—see his MacTutor entry.
3. A number of valuable reprints of classics from the dynamical systems literature are available here, included Robert May's 1976 Nature article dealing with the logistic function

Theorem no. 72: MacWilliams' Identity
1. There is a one-variable version of this theorem corresponding to the nonhomoeneous polynomial obtained by setting $$y=1$$ in the definition of $$W_C(x,y)$$. The identity becomes $$W_{C^{\perp}}(x)=|C|^{-1}W_C(1-x,1+(q-1)x)$$ which, using the single-argument version of $$W_C$$ becomes $$W_{C^{\perp}}(x)=\dfrac{1}{|C|}(1+(q-1)x)^nW_C\left(\dfrac{1-x}{1+(q-1)x}\right)$$.
2. There is a very nice application of MacWilliams, due to Assmus and Maher, to prove nonexistence of projective planes of order 6 modulo 8. An excellent presentation of this work is given by Matroid Union.

Theorem no. 73: Goodstein's Theorem
1. For some remarks on Goodstein's theorem in the context of the search for independence results for Peano arithmetic see Michael Rathjen, "Goodstein Revisited", Annals of Pure and Applied Logic, in press; preprint.

Theorem no. 74: Gödel's First Incompleteness Theorem
1. The theorem can be stated as "a consistent mathematical system contains assertions for which neither the assertion nor its negation can be proved from the axioms of the system". The version which says "contains truths which cannot be proved" is equivalent by virtue of the fact that, given consistency, non-provability of a negated assertion is synonymous with truth of the assertion. See Peter Cameron's article in Gowers et al (eds.), The Princeton Companion to Mathematics, Princeton University Press, 2008, a preprint of which is here.
2. Mark Chu-Carroll's blog Good Math, Bad Math has posted a very nice 4-part introduction to Gödel I: the final part, which indexes the other three, is here. A good overview with a philosophical slant is this from The Times Literary Supplement by Juliette Kennedy, whose webpage has much else of value.
3. A fascinating self-contained exposition of Gödel's two incompleteness theorems is given by Stanisław Świerczkowski here (Dissertationes Math., 422, 2003, click on the 'POBIERZ ZA DARMO' button) in terms of the theory of 'hereditarily finite sets'. These were the proofs which were chosen as suitable for machine checking in 2015: Lawrence C. Paulson (2015). "A mechanised proof of Gödel’s incompleteness theorems using Nominal Isabelle", Journal of Automated Reasoning. 55, no. 1: 1–37
4. A short sketch proof of Gödel's theorem by Raymond Smullyan (c.f. 5000 B.C. and Other Philosophical Fantasies, St. Martin's Press, 1983) is reproduced here.
5. Marianne Freiberger has written an excellent account for Plus magazine of Harvey Friedman's work developing non-artificial examples of Gödel's theorem in action.

Theorem no. 75: Gödel's Second Incompleteness Theorem
1. There is a delightful 1-page description of Gödel 2 in 'words of one syllable' by George Bollos here.
2. See also note (3) to Gödel 1.

Theorem no. 76: Brahmagupta's Formula
1. Coolidge attributes an equivalent to his quadrilateral area formula to Carl Anton Bretschneider and Friedrich (I think) Strehlke, both 1842. Wikipedia has an entry on Bretschneider's Formula which also credits (also 1842) von Staudt: $$K=\sqrt{(s-a)(s-b)(s-c)(s-d)-abcd\cos^2\theta},$$ where $$\theta$$ is half the sum of either pair of opposite angles (in a cyclic quadrilateral opposite angles sum to $$180^{\circ}$$ with the half angle giving zero cosine).
2. Using the notation of the theorem description, Ptolemy's Inequality says $$ad+bc\geq ef$$ with equality if and only $$abcd$$ is cyclic. So the subtracted term in Coolidge's square root is always nonnegative (this is immediate from Bretschneider's Formula).
3. A clever trapezoid version of Heron's formula due to Miguel Ochoa Sanchez can be found at cut-the-knot.org.
4. Also regarding Heron, see note (3) for Pythagoras.
5. Fascinating material on Indian mathematicians' investigations of cyclic quadrilaterals may be found in Radha Charan Gupta, "Parameśvara's rule for the circumradius of a cyclic quadrilateral", Historia Mathematica, Vol. 4, Issue 1, February 1977, 67–74, online here. Parameśvara's (also known as L'Huilier's) rule states that: circumradius of cyclic quadrilateral with sides $$a,b,c,d$$ is obtained on dividing $$\sqrt{(ab+cd)(ac+bd)(ad+bc)}$$ by $$4\times\mbox{area of quadrilateral}$$.
6. Another open archive article from Historia Mathematica deals directly with Brahmagupta's formula: Satyanad Kichenassamy, "Brahmagupta’s derivation of the area of a cyclic quadrilateral", Historia Mathematica Vol. 37, Issue 1, February 2010, Pages 28–61.
7. As distinct from Brahmagupta's Formula, his Theorem usually refers to another result about a cyclic quadrilateral : if its diagonals are orthogonal and a line joins their intersection to a side, then the continuation of this line bisects the side opposite.

Theorem no. 78: The Three-Distance Theorem
1. The theorem is often referred to as 'the Steinhaus Conjecture'. However, it doesn't seem to have survived long enough to warrant the title. Stanisław Świerczkowski sent me the following interesting background information on the history and his proof of this theorem:
 "The Three Distances Theorem was conjectured by Hugo Steinhaus. When I gave him the proof, he checked it and asked me to write a report for the Academy of Sciences. He presented this report to the Academy (only members could do that) whereupon it was published in the BULLETIN DE L’ACADEMIE POLONAISE DES SCIENCES, Cl. III – Vol. IV, No.9, 1956. The date of his presentation to the Academy is: 29 June 1956. I have here an offprint. If you wish to look at it, and it is not in your library, I could scan it and send you an image by email. About that time Vera Sós and her husband Prof. Turán visited our University, so they certainly heard about the Three Distances result directly from my colleagues or from me. Of course, Erdős was visiting us too many times. I remember Vera quite well. In the announcement in the BULLETIN DE L’ACADEMIE POLONAISE mentioned above, there are four related theorems, and I postponed publishing the proofs of these to a later paper. By the time I wrote it, Vera Sós published her proof of the Three Distances Theorem, so I found it simpler to just refer to her paper. My proof (which I do not recall) almost certainly found its way into my Ph.D. thesis on the subject of cyclically ordered groups. This dissertation was submitted to the Polish Academy of Sciences. A recent inquiry disclosed that they cannot find it."
2. There are generalisations of the theorem, see for instance J.F. Geelen and R.J. Simpson, "A two dimensional Steinhaus theorem", Australasian J. Combinatorics, vol. 8, 1993, 169–197, available online here.
3. The theorem is often visualised as points plotted around a circle of unit circumference, see this blog entry from Σidiot's blog, for example.

Theorem no. 79: The Fifteen Theorem
1. Manjul Bhargava won a Fields Medal in 2014 in part for his work on quadratic forms. He is interviewed by plus magazine on the subject here.

Theorem no. 80: One-Factorisation of Regular Graphs
1. This conjecture should probably be taken as originating with Amanda Chetwynd and Anthony Hilton's 1985 paper "Regular graphs of high degree are 1-factorizable", Proc. London Math. Soc., 50 (1985), 193–206. In Stiebitz et al, Graph Edge Coloring: Vizing's Theorem and Goldberg's Conjecture, Wiley-Blackwell, 2012, on p. 259, we find the comment that: "it [the one-factorisation conjecture] 'was going around' in the early 1950s, Hilton [136] was told by G.A. Dirac." And then, "fifteen years before the one-factorization conjecture was published by Chetwynd and Hilton, Nash-Williams [233] proposed a much stronger conjecture, which is also known as the hamiltonian factorization conjecture: Let $$G$$ be a $$\Delta$$-regular graph on $$2n$$ vertices, where $$\Delta\geq n$$. Then $$G$$ has a hamiltonian factorization, i.e., $$G$$ is the edge-disjoint union of $$\Delta/2$$ Hamilton cycles if $$\Delta$$ is even or $$(\Delta-1)/2$$ Hamilton cycles and a linear factor if $$\Delta$$ is odd." (The factorisation of $$K_2\times K_5$$ is our theorem illustration is an example). However, Anthony Hilton has told me that he has never seen any mention of his conjecture published prior to 1985 and considers it unlikely that it would have been thought about in the 1950s. Certainly Vizing's Theorem was not known until the 1960s. Ref [233] is C. St. J. A. Nash-Williams, "Hamiltonian lines in graphs whose vertices have suciently large valencies", in Combinatorial theory and its applications, III, North-Holland 1970, 813–819. The recent work of Csaba et al (see main theorem description) also resolves Nash-Williams' conjecture for large $$n$$.
2. Csaba et al is here (preprint) and has been published electronically (pay-walled) as "Proof of the 1-factorization and Hamilton Decomposition Conjectures", Memoirs of the American Mathematical Society, 2016, Vol. 244, Number 1154.

Theorem no. 81: The Lutz–Nagell Theorem
1. When I first posted this theorem I ignorantly assumed that my example elliptic curve $$y^2=x^3-43x+166$$ crossed the horizontal axis at $$(-8,0)$$, giving a rational point of order 2 to illustrate the second case of the theorem. Shaun Stevens was kind enough to put me right, pointing out that this would imply a torsion group of order at least $$2\times 7$$, contradicting the maximum order of 12 asserted by Mazur's Torsion Theorem.

Theorem no. 82: Gruenberg's Theorem on Nilpotent Groups
1. The Heisenberg group $$H(n)$$ is a group of upper triangular matrices under multiplication; thus for $$n=3$$: $$\left(\begin{array}{ccc} 1 & a & c \\ 0 & 1 & b \\ 0 & 0 & 1\end{array}\right)\times \left(\begin{array}{ccc} 1 & x & z \\ 0 & 1 & y \\ 0 & 0 & 1\end{array}\right) =\left(\begin{array}{ccc} 1 & a+x & c+z+ay \\ 0 & 1 & b+y \\ 0 & 0 & 1\end{array}\right).$$ The multiplication looks a bit obscure when expressed in terms of triples, as in our theorem description. However it is a natural convenience in the analysis of this group, c.f. this interesting paper.

Theorem no. 84: The Five Circle Theorem
1. Same comment as for Theorem 55 regarding Java; the app for this theorem, if you want to try, is here.
2. The web link for this theorem, a paper by Wolfgang Schief and Boris Konopelchenko, is complemented by an interesting slide presentation by Schief here.
3. A converse to the five-circle theorem by Frank Morley is discussed in Tobias Dantzig, "Elementary Proof of a Theorem Due to F. Morley", The American Mathematical Monthly, Vol. 23, No. 7 (Sep., 1916), pp. 246–248; online. (Tobias was the father of George of simplex method fame).

Theorem no. 85: Cayley's Theorem
1. Our description and illustration of this theorem are limited to finite groups but it applies equally to infinite groups.
2. I wrote down Peter Neumann's assessment of Cayley's theorem at a talk he gave at Queen Mary University of London during a celebration to mark the 60th birthday of R.A. Bailey:

Theorem no. 86: Noether's Symmetry Theorem
1. Some other excellent accounts of Noether's theorem can be found here at the Physics Mill blog and here at the Perimeted hr Institute.

Theorem no. 87: Lamé's Theorem
1. This theorem comes in many different guises, all essentially saying that Euclid's algorithm takes longest (relative to input size) when applied to consecutive Fibonacci numbers. An excellent survey is provided by cut-the-knot.

Theorem no. 88: The Cauchy–Kovalevskaya Theorem
1. Putting Kovalevskaya's work in a modern context is this HAL archives article: Elemer Elad Rosinger, "Can there be a general nonlinear PDE theory for existence of solutions?"
2. Garry J. Tee kindly sent me a scan of his well-known article about Kovalevskaya which appeared in the Mathematical Chronicle in 1977. The Mathematical Chronicle Committee in turn kindly gave me permission to host a copy here. I have made pdf (about 3.7MB) and Powerpoint (about 9MB) versions. (The Mathematical Chronicle Committee is arranging for digitisation of Mathematical Chronicle).

Theorem no. 90: Cardano's Cubic Formula
1. A more explicit version of this formula (comparable in format to the quadratic formula) can be found here. This version is made fun of in this blog post.
2. The case where a cubic equation has three real roots which may only be expressed in radical form using complex numbers is known as casus irreducibilis. It was determined by Pierre Wantzel in 1843.
3. Thony Christie offers a very detailed discussion of the skirmishes around ownership of solutions to the cubic here. A posting here by Plus magazine is also illuminating.
4. Kellie Gutman has made an English translation of Tartaglia's verse-form solution of the cubic, brought to us by poetrywithmathematics.blogspot.fr.

Theorem no. 91: Khinchin's on Continued Fractions
1. A closely related result of Khinchin says that that for almost all real numbers, the $$n$$-th root of the denominator of the $$n$$-th convergent of the continued fraction expansion tends in the limit to a fixed constant, known as Lévy's constant.

Theorem no. 92: The Quadratic Formula
1. A valuable list of derivations of the quadratic formula is provided by Cut-The-Knot.

Theorem no. 94: Cayley's Formula
1. The second proof of Cayley in Stanley's book, by André Joyal, has a nice application to automata theory by Peter Cameron, described here.
2. Also in Cameron's blog: another lovely enumeration yielding $$n^{n-2}$$.

Theorem no. 95: The Convolution Theorem
1. For Gauss and FFT see Michael T. Heideman, Don H. Johnson and c. Sidney Burrus, "Gauss and the History of the Fast Fourier Transform", IEEE ASSP Magazine, October 1984, 14–21. Online here. Lanczos's contribution is mentioned in his MacTutor entry.
2. The continuouse convolution operator and related theorems date back to Euler and Laplace, as described by Alejandro Dominguez, "A History of the Convolution Operation", IEEE Pulse, January/February 2015; online.
3. A lovely gentle introduction to Fourier transforms by betterexplained.com.

Theorem no. 96: The Rule of Sarrus
1. There is apparently a 4×4 version of the Rule of Sarrus called the 'Rule of Villalobus' due to the Mexican mathematician Gustavo Villalobos Hernández. See comment (4) here which links to a Spanish wikipedia entry which unfortunately appears to have been deleted (although the Spanish wiki determinant page mentions it)

Theorem no. 98: Cartwright's Theorem
1. The reference for this theorem is "On analytic functions regular in the unit circle II", Quart. J. Math. Oxford Ser. (2) 6 (1935) 94–105. You can preview the first page free here. In his obituary of Cartwright (Bull. London Math. Soc. 34 (2002) 91–107) Walter K. Hayman comments "With this paper the author essentially created a new field. It was almost the only paper quoted by Littlewood in his book [Lectures on the Theory of Functions, OUP, 1944] and led me to ask Mary Cartwright to become my research supervisor." The citation appears on p. 232 on Littlewood's book. He writes "The proofs, due to Cartwright, are difficult, and depend on ideas unlike any we have been considering". The book is online here.
2. There is a good popular account of this theorem by Jaz_Will here.

Theorem no. 99: The Happy Ending Problem
1. The attribution to Turán of the solution to n = 5 for this problem I found in this article by Imre Bárány. More details are given by Szekeres and Peters in their article proving n = 6.
2. There is a very nice description of the Happy Ending problem and Suk's asymptotic solution here, by Quanta magazine.

Theorem no. 101: Kepler's Conjecture
1. This was a 'theorem under construction' pending Hales' Flyspeck project to reinforce his original proof with a machine-automated one. This completed in 2014 but the proof was already made bullet-proof with his 2012 publication of Dense Sphere Packings: A Blueprint for Formal Proofs. The formal proof is also described by Hales and twenty-one other authors in "A formal proof of the Kepler conjecture", Forum of Mathematics, Pi (2017), Vol. 5, e2; online.
2. The story of Hales' original proof and publication of Kepler is well told here. Note that his main referee at Annals, namely Gábor Fejes Tóth, is the son of László Fejes Tóth who made the original breakthrough in the conjecture's resolution. By the way, a statement by the Editorial Board at Annals explicitly invites "computer-assisted proofs of exceptionally important mathematical theorems", an invitation which I understand to pre-date Kepler.
3. The Flyspeck project is now at GitHub. (The old google code website is still accessible here, as of January 2019, but carries the warning "Google is shutting down google code and we are being evicted.")

Theorem no. 102: Viète's Formula
1. Tom Ostler has a more detailed account of the connections between Viète's formula and ruler and compass constructions in "Geometric constructions approximating pi", Mathematical Spectrum, Vol. 40, N. 3 (May 2008), 106–108; preprint.

Theorem no. 103: The Parking Function Formula
1. The bijection to labelled trees establishes the formula but the book' proof is due to Henry O. Pollack. An account is given in this wonderful presentation by Richard Stanley
2. Apparently parking functions and their enumeration originate in Ronald Pyke, "The supremum and infimum of the Poisson process", Ann. Math. Statist. 30 (1959) 568–576; online preprint(900KB pdf).
3. The note by Peter Cameron which is the recommended weblink for this theorem was written for the Queen Mary Maths dept student newsletter. Its topic is very nicely explored in this presentation by Thomas Prellberg.

Theorem no. 104: The Goins–Maddox–Rusin Theorem for Heron Triangles
1. The triangle-to-curve conversion rule given in the text (from the Goins–Maddox paper) guarantees to produce an elliptic curve and a rational point on that curve. The actual curve and point depend on the order in which sides $$a,b$$ and $$c$$ are chosen. For example, the triangle with sides $$3, \frac{10}{3},\frac{17}{3}$$, with area $$n=4$$, gives $$\rho=1/4$$ or $$2$$ or $$2/9$$, depending on which sides are called $$a,b$$ and $$c$$. And for the curve depicted, $$\rho=1/4$$, neither ordering of the sides recovers the point $$P=(-8,24)$$ on this curve (which gives the same triangle but with some negative sides) --- instead points $$(2,6)$$ and $$(18,102)$$ are discovered.
2. The version of Heron's formula I have used is one of several given in the wikipedia entry. It is useful for showing the invariance of the rule under sign changes.
3. There is a nice account here of searching for congruent numbers (integer areas of rational right triangles). The associated sequence at oeis is A003273.

Theorem no. 105: The Pumping Lemma
1. A nice entry at Computational Complexity by Bill Gasarch poses a challenge "Find a non-reg lang that is not easily proven non-reg." Meaning, the pumping lemma plus some other basic tools always seems to be sufficient.

Theorem no. 107: The Tverberg Partition Theorem
1. The original recommended web link for this page was Stephen Hell's magnificent dissertation "Tverberg-type Theorems and the Fractional Helly Property". This became unavailable as a direct download but is now located at depositonce.tu-berlin.de/handle/11303/1761 and is highly recommended.
2. Tverberg's Theorem is a special case of the so-called Topological Tverberg Conjecture. The conjecture is true when $$r$$, the number of partition parts, is a prime power but was proved false in general in 2015. It is all very-well described by Gil Kalai: follow the links back from here (you will also find a proof of Tverberg's Theorem itself).
3. Also by Gil Kalai, a good overview, marking the birthday of Imre Bárány, of various 'Tverberg' results and conjectures

Theorem no. 108: The Analyst's Travelling Salesman Theorem
1. There is a good wiki page on ATST where you can find references to the original articles by Jones and Okikiolu. Jones is open access online here but Okikiolu's article, in common with most LMS stuff, remains blocked.
2. Raanan Schul's Analyst's Traveling Salesman Theorems, a Survey is an excellent source of more detailed and more recent information.
3. The Koch snowflake image in the illustration of this theorem was copied from a website called www.scientificweb.com/testreport/mathbench4/ which appears no longer to exist.

Theorem no. 109: The Beardwood-Halton-Hammersley Theorem
1. An interesting counterexample of Alessandro Arlotto and J. Michael Steele puts a limit on how far the hypothesis on the variables $$X_1, X_2, \ldots$$ forming the TSP tour may be relaxed.

Theorem no. 110: The Robbins Problem
1. John Baez has a nice explanation here of a single-axiom definition of a lattice. I first saw this as a Baez google+ post accompanied by his usual crop of quality comments, a couple of which I quote:
• David Tweed: "Related, there's a claim to a more human proof that the Robbins axioms describe boolean algebra at http://www.markability.net/robbins.htm . Frustratingly it doesn't say if this was derived from unintuitive computer proofs or independently.﻿ So a fact first proved by computer may later acquire a human proof."
• Robert Rothenberg: It reminds me of work people have done to come up with single-axiom versions of various logics, I think Harris & Rezus. See A. Rezus, "On a Theorem of Tarski", Libertas Math. 2, pp. 63-97. Discussed in http://fitelson.org/ar.html .

Theorem no. 111: De Morgan's Laws
1. I found the following on Peter Cameron's mathematical quotations page (under logic):
 " ... the contradictory opposite of a copulative proposition is a disjunctive proposition composed of the contradictory opposites of its parts. ... the contradictory opposite of a disjunctive proposition is a copulative proposition composed of the contradictories of the parts of the disjunctive proposition." William of Ockham (Occam), Summa totius logicae, 14th century (transl. Philotheus Boehner 1955) Note: This is Ockham's formulation of De Morgan's Laws, more than five hundred years before De Morgan. It is just as clear in his Latin text.

Theorem no. 112: Bregman's Theorem
1. A. Schrijver attributes to Brouwer (presumably L.E.J. Brouwer who was still alive when Minc originally conjectured what became Bregman's Theorem) the observation that an upper bound for permanents of arbitrary nonnegative matrices may be derived directly from the bound for (0,1)-matrices. Thus, if $$v = (b_1, b_2, \ldots, b_n)$$ is a vector in descending order, and letting $$b_{n+1}=0$$, define $$g(v)=\sum_{k=1}^n(b_k-b_{k+1})(k!)^{1/k}.$$ Then the Minc bound is replaced by the product $$\prod_ig(v_i)$$ over the rows $$v_i$$ of the matrix. (A. Schijver, "A short proof of Minc's conjecture", J. Combin. Theory, A, vol. 25, no. 1, 1978, 80–31; online.)
2. Although computing $$\mbox{per}(M)$$, the permanent of an arbitrary matrix $$M$$, is hard (specifically, #P-hard) even if $$M$$ is $$0\mbox{-}1$$, in the case where $$M$$ has nonnegative entries Jerum, Sinclair and Vigoda have given a polynomial algorithm which gives a value arbitrarily close to $$\mbox{per}(M)$$ with high probability (FPRAS).

Theorem no. 113: The van der Waerden Conjecture
1. The probability calculations in this theorem description may be justified explicitly by observing that column sums add to unity, so that for, say, System 1, $$\left(\frac12+\frac14+\frac14\right)\times\left(\frac13+\frac58+\frac{1}{24}\right)\times\left(\frac16+\frac18+\frac{17}{24}\right)=1,$$ which, expanding the brackets, equals the sum of the probabilities of every possible product of output failures over the three computers.
2. Closely related are questions about permanents of matrices having all row and column sums equal, counting perfect matchings in regular bipartite graphs of equal-size parts. Lower bounds have recieved much attention, notably the Schrijver–Valient Conjecture. Some recent information is here.

Theorem no. 115: The Hardy–Ramanujan Asymptotic Partition Formula
1. Hardy and Ramanujan published their asymptotic formula in 1918 in "Asymptotic Formulae in Combinatory Analysis", Proc. London Math. Soc., (2) 17, pp 75–115. But they had published preliminary versions as early as 1916 and the abstract to the 1918 paper appeared in the records of LMS Proceedings for March 1st, 1917 (Hardy himself, as vice-president, taking the chair). The papers can be viewed online here.

Theorem no. 116: The Polynomial Coprimality Theorem
1. The row and column labels in the illustration of this theorem have been supressed for $$n=3$$ to save space. They are, in order, $$x^3, x^3+1, x^3+x, x^3+x+1, x^3+x^2, x^3+x^2+1, x^3+x^2+x, x^3+x^2+x+1$$, the 4th and 6th being irreducible (irreducible polynomials up to degree 5 over GF(2) are listed here).
2. The theorem first appears in S. Corteel, C. Savage, H. Wilf, D. Zeilberger, "A Pentagonal number sieve", Journal of Combinatorial Theory, Series A, vol. 82, no. 2, 1998, 186–192. There is a reprint online here (scroll down to 1998). The constructive proof of Arthur T. Benjamin and Curtis D. Bennet is "The Probability of Relatively Prime Polynomials, Mathematics Magazine, vol. 80, no. 3 (2007), pp. 196–202; online.
3. Thomas Hagedorn and Jeffrey Hatley have generalised this result to consider polynomials over the ring $$\mathbb{Z}_{p^k},\ p$$ prime: "The probability of relatively coprime polynomials in $$\mathbb{Z}_{p^k}[x]$$", Involve, vol. 3, no. 2, 2010, pages 223–232; online. Their paper reviews several other generalisations.

Theorem no. 118: Catalan's Conjecture (Mihăilescu's Theorem)
1. There is more on Pillai's conjecture and on the history of Catalan's conjecture here by Michel Waldschmidt.

Theorem no. 119: Kneser's Conjecture
1. Lovász's proof of Kneser's conjecture appears in "Kneser's conjecture, chromatic number, and homotopy", J. Combin. Theory A, Vol. 25, issue 3, 1978, 319–324; online. In the same issue there is a half-page proof, inspired by Lovász's and also topological in nature, by Imre Bárány, "A short proof of Kneser's conjecture", J. Combin. Theory A, Vol. 25, issue 3, 1978, 325–326; online. The elementary proof of Jiří Matoušek appears in "A Combinatorial Proof of Kneser’s Conjecture", Combinatorica, Vol. 24, Issue 1, 2004, 163–170. There is a copy online here.

Theorem no. 120: The Lovász Local Lemma
1. An earlier version of this theorem description may be found here. It uses an artificial number theory example to show non-independent sets which are nevertheless pairwise independent. But the conclusion of the Local Lemma is still true even though its hypotheses are false. The example replacing it gives us a false conclusion and is a bit less artificial (in fact it is based on a scenario from cryptography, described to me by Michelle Kendall, which is not artificial at all).
2. The current example concerns an event Aij that two multisets of size 2, each chosen with repetition from the set {1,...,n}, have nonempty intersection. The number of ways that such a pair of intersecting multisets can be chosen is $${n+1\choose 2}^2-\left({n\choose 2}{n-1\choose 2}+n{n\choose 2}\right)=\frac12n\left(2n^2-n+1\right).$$ This is A081436 in oeis.org. The probability of $$A_{ij}$$ is thus $$\frac12n(2n^2-n+1)/{n+1\choose 2}^2$$.
The probability of the triple event $$A_{ij}\cap A_{ik}\cap A_{jk}$$ may be calculated as $$\frac12n(2n^3+2n^2-7n+5)/{n+1\choose 2}^3$$. When $$n = 3$$, this has value $$7/18$$, much greater than $$8/27$$, the cube of the probability of the single event.
3. The original Local Lemma appears in this paper.
4. The paper of Carsten Thomassen is here: the result on hypergraph colouring appears as Theorem 5.1.
5. Anwer Khurshid and Haredo Sahai, "On mutual and pairwise independence: some counterexamples", Pi Mu Epsilon Journal, Vol. 9, No. 9, 1993, 563–570, looks promising as a source of further false applications of the Local Lemma. Online here (10.3MB pdf).
6. A constructive proof of the Local Lemma is given in Robin A. Moser and Gábor Tardos, "A constructive proof of the Lovasz Local Lemma", Journal of the ACM, Vol. 57 Issue 2, 2010; preprint.

Theorem no. 121: Lambert's Formula
1. How this formula arose out of Lambert's attempts to prove Euclid's 5th postulate is beautifully described here by Evelyn J. Lamb...
2. ...who also describes and links to a beguiling interactive tool for tiling the Poincaré disk, part of a fine non-Euclidean geometry resource by  Malin Christersson.

Theorem no. 123: The Wedderburn–Artin Theorem
1. An interesting article on the significance of Wedderburn-Artin is provided by Quora (with companion entries on significance of several other theorems)

Theorem no. 124: The Lagrange Interpolation Formula
1. The link between Lagrange interpolation and the Chinese Remainder Theorem is discussed here.
2. Of course linear Lagrange interpolation for sets of points in $$\mathbb{R}^2$$ extends in various directions. A pretty derivation for linear interpolation in $$\mathbb{R}^n$$ by Kamron Saniee is given in "A Simple Expression for Multivariate Lagrange Interpolation", SIAM Undergraduate Research Online (SIURO), Vol. 1, Issue 1, 2008; online. These lectures notes by Kostas Kokkotas do an excellent job of giving the wider context for interpolation (see chapter 3).
3. You can find interpolation calculators on the web. This, from dCode, a French ciphers and cryptogram site, for example; or wolfram alpha which will respond to something like "interpolate (0,1), (1,3),(2,5)".

Theorem no. 125: The Skolem–Noether Theorem
1. For a short (but not self-contained) proof of Skolem–Noether see this from the stacks project.

Theorem no. 126: The Asymptotic (Half) Liar Formula
1. The exponential nature of the Dumitriu–Spencer asymptotic can be appreciated by noting that, while $$U_1(7)\geq 15$$ (the example illustrating the theorem), the value of $$U_1(25)$$ exceeds $$10^6$$, i.e. asking 25 questions is guaranteed to find a value between 1 and a million, in the face of at most one lie. A nice account of this result by Deryk Osthus and Rachel Watkinson is here.
2. The exact value of $$U_1(7)$$ is $$16$$; $$U_1(25)=1290554$$. With such rapidly growing values it is more convenient to ask the liar question the other way round: given a value for $$n$$ what is the least number, $$q_k(n)$$, of questions which will guarantee a number in the range $$1,\ldots,n$$ is determined, in the face of $$k$$ lies? This is the version discussed by Osthus and Watkinson. Thus $$U_1(7)=16$$ because $$q_1(16)=7$$ but $$q_1(17)=8$$.
3. A comprehensive and very readable survey, "Searching games with errors—fifty years of coping with liars" by Andrzej Pelc, Theoretical Computer Science, 270, 71–109 , is available here via Elsevier's Open Access Archive.
4. A tangential but intriguing link regarding the Fano plane, which generates Peter Cameron's trick illustrating this theorem, is this on using the Fano plane to generate poetry.
5. Although I attribute the delphic oracle image used in my illustration of this theorem to the Staatliche Museen Berlin I think this attribution is based merely on a google image search. I have elsewhere seen it attributed to the "Collection of Joan Cadden".

Theorem no. 127: The Lucas–Lehmer Test
1. I like this short blog entry, written by a Mersenne prime hunter, on using the Lucas–Lehmer test.
2. There is a good entry here by 3010tangents about how to prove Lucas–Lehmer.
3. For an elementary proof of Lucas–Lehmer see J. W. Bruce, "A really trivial proof of the Lucas–Lehmer test", Amer. Math. Monthly, 100 (1993), 370–371; online.

Theorem no. 128: The Euclid–Euler Theorem
1. The MacTutor archive offers a good page on the history of perfect numbers.
2. I found this wonderfully open-ended exploration of perfect numbers (1.8MB pdf) by Oliver Knill (one of many he has posted).
3. Article no. 1 at John Voight's site is a nice exploration of odd perfect number properties.
4. The current lower bound on odd perfect numbers appears to be Pascal Ochem and Michaël Rao,"Odd perfect numbers are greater than $$10^{1500}$$", Mathematics of Computation, Vol. 81, Issue 279, 2012, 1869–1877; online.

Theorem no. 129: The Ollerenshaw–Brée Formula
1. Evelyn Lamb wrote this very nice tribute to Ollerenshaw on the occasion of her 100th birthday. The entry for her at Agnes Scott is also good on technical details. Another tribute to Ollerenshaw's work on magic squares was carried by the Royal Society Publishing Blog
2. A depiction of a most-perfect magic square dating from the 10th century is found in the Jain temple Parshvanatha at Khajuraho in Madhya Pradesh.
3. A good source of information on magic squares generally is the website of Francis Gaspalou.

Theorem no. 130: A Theorem on Apollonian Circle Packings
1. The papers that inspired this theorem description can be located on Ron Graham's website dated 2003–2006. The theorem as given is from R. L. Graham, J.C. Lagarias, C.L. Mallows, A.R. Wilks, and C.H. Yan, "Apollonian circle packings: number theory", J. Number Theory, 100 (2003), no. 1, 1–45.

Theorem no. 132: Theaetetus' Theorem on the Platonic Solids
1. A nice account of the proof of this theorem using Euler's Polyhedral Formula is given by John D. Cook here.
2. I recommend this intriguing discussion by Pat Ballew of which of the Platonic solids is 'most spherical'.
3. This theorem is the subject of Episode 8 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.

Theorem no. 133: The Total Probability Theorem
1. In The Doctrine of Chances: Probabilistic Aspects of Gambling, Springer-Verlag, 2010, Stewart N. Ethier writes (p. 68) "The conditioning law (... often called the law of total probability) was used without comment by Montmort and De Moivre. It was formalized in the derivation of Bayes's law."
2. An excellent and much more thorough application of probability theory to the deuce rule of tennis is provided here by Chalkdust magazine.

Theorem no. 137: Wallis's Product
1. The original web link for this theorem was royalsocietypublishing.org/toc/rsnr1938/54/3 where Jacqueline Stedall has two wonderful articles on Wallis's work: "Catching Proteus: the collaborations of Wallis and Brounker. I. Squaring the circle" and "Catching Proteus: the collaborations of Wallis and Brounker. II. Number problems", Notes and Records of the Royal Society of London, vol. 54, no. 3. The first one is the source of the quotation "perhaps the one real stroke of genius in Wallis’s long mathematical career." These articles were free-to-view when I first posted the description, then they became restricted; when I last checked (Jan 2017) they were accessible again. Highly recommended!

Theorem no. 138: Vaughan Pratt's Theorem
1. Since polynomial-time solvable problems are automatically in NP this theorem is, since 2002, a corollary of the well-known algorithm of Agrawal, Kayal and Saxena (see the web-link for this theorem).

Theorem no. 139: Strassen's Matrix Theorem
1. Virginia Williams' account of her reduction in ω can be found in overview and technical versions at her website. There is a very good account of her work at Gödel's Lost Letter by Richard Lipton; the latest account by Williams I can find is this, which is rivalled in this article by François Le Gall, who also has an article with Florent Urrutia on non-square matrix multiplication; there is a short discussion about the problem at cstheory.stackexchange.
2. An interesting discussion by Bill Gasarch on limitations of current approaches to getting ω = 2 + ε can be found at Lance Fortnow' & Bill Gasarch's comutational complexity blog.
3. The complexity of matrix multiplication is inimately connected with some conjectures in extremal set theory as discussed by Gil Kalai.
4. Conventional wisdom says that Strassen is only advantageous for large matrices but this is challenged in Jianyu Huang, Tyler M. Smith, Greg M. Henry and Robert A. van de Geijn, "Strassen's Algorithm Reloaded", Proc. The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), Salt Lake City, UT, November 2016. A preprint is to be found on Huang's webpage.
5. Paolo D’Alberto has a whole blog devoted to technical issues of fast matrix multiplication.
6. Veit Elser has this intruiging machine learning preprint: "A network that learns Strassen multiplication"

Theorem no. 140: A Theorem on Rectangular Tensegrities
1. This post by Dave Richardson gives a nice introduction to the subject of graph theory and rigidity.
2. Joseph Malkevitch has a valuable bibliography (up to 2001) of rigidity publications in which the work cited in this theorem description can be located.

Theorem no. 141: The Piff–Welsh Theorem
1. An excellent online introduction to matroid theory is provided by Joseph Malkevitch here.

Theorem no. 143: The Robinson–Schensted–Knuth Correspondence
1. Knuth's paper describing the RSK Correspondence is D.E. Knuth, "Permutations, matrices, and generalized Young tableaux", Pacific J. Math., Vol. 34, Number 3 (1970), 709–727; online.

Theorem no. 144: Lieb's Square Ice Theorem
1. Elliott Lieb's original paper is "Residual Entropy of Square Ice", The Physical Review, vol. 162, no. 1, 1967, pp 162–172. Even after 50 years you still have to pay to read it online but the first two pages are displayed here. (For Russian readers it is translated online here.)
2. I originally gave as a web link from the theorem page a very nice but technical article by Stefan Felsner, Florian Zickfeld: "On the number of planar orientations with prescribed degrees", Electronic Journal of Combinatorics, vol. 15, 2008; online. This deals with orientations of planar graphs in much more generality — Lieb's result appears in section 2.2.
3. It has been discovered that water can form into square ice at room temperature by confining it using layers of graphene.

Theorem no. 145: The Contraction Mapping Theorem
1. mathcounterexamples.net has a valuable collection of counterexamples showing that all the hypotheses of this theorem are needed.
2. This theorem is the subject of Episode 24 of Kevin Knudson and Evelyn Lamb's My Favorite Theorem blog.
3. John D. Cook has an intriguing blog entry on Kepler's use of Banach's theorem three centuries before it was discovered!

Theorem no. 146: The Panarboreal Formula
1. The $$1/2n\log n$$ lower bound on size of 'universal graph' is proved as Theorem 1 in F.R.K. Chung and R.L. Graham, "On graphs which contain all small trees", J. Combinat. Theory B, vol. 24, issue 1, 1978, pp 14–23; online.
2. The asymptotic formula of Chung and Graham is first mentioned in their 1979 paper "On universal graphs", Annals of the New York Academy of Sciences, 319 (1979), 136–140; online.
3. Some more work on panarboreal graphs and related issues is given in section 6.8 of "Spanning trees – A survey" by Kenta Ozeki and Tomoki Yamashita, 2010.
4. The sequence of sizes of panarboreal graphs starts 0, 1, 2, 4, 6, 8, 11, 13, 16, 18, and is OEIS sequence A004401. This is the number of edges a graph needs to have in order to contain all $$n$$-vertex trees. This is possibly smaller than for the question asked in our presentation of Chung and Graham's theorem since they ask for the number of edges when we insist the graph must also have $$n$$ vertices. The values are perhaps the same, as pointed out in the OEIS entry.

Theorem no. 147: The Sophomore's Dream
1. T he decimal expansion of $$\int_0^1 x^x dx$$ and related material can be found at oeis.org/A083648; $$\int_0^1 x^{-x} dx$$ is sequence A073009.

Theorem no. 151: The Small Prime Gaps Theorem
1. For the sake of contrasting this theorem with subsequent proofs that $$p_{n+1}-p_n\leq c$$ infinitely often for a constant $$c$$, its conclusion may be replaced by $$p_{n+1}-p_n\leq (\log p_n)^{1/2+\epsilon}$$. The progress towards current knowledge is beautifully described by Terence Tao in this youtube lecture and, for a more general audience, this lecture by Vicky Neale.
2. The background to Zhang's proof of $$p_{n+1}-p_n\leq c$$ and subsequent improvements by Maynard and Tao are described by John Friedlander in "Prime Numbers: A Much Needed Gap Is Finally Found", Notices of the AMS, Vol. 62, No. 6, June/July 2015, 660–664, online here. A more technical overview is given by Andrew Granville, "Primes in intervals of bounded length", Bull. Amer. Math. Soc., 52 (2015), 171–222, online here.

Theorem no. 152: De Moivre's Theorem
1. Here is a cute demonstration, using De Moivre, that the series $$\cos(1),\cos(2),\cos(3),\ldots$$ does not converge.

Theorem no. 154: The Remainder Theorem
1. Colin Beveridge has a good intuitive introduction to the remainder and factor theorems at FlyingColoursMaths.

Theorem no. 155: A Tripartite Turán Theorem
1. The source for this theorem is Adrian Bondy, Jian Shen, Stéphan Thomassé and Carsten Thomassen, "Density Conditions For Triangles In Multipartite Graphs", Combinatorica, Vol. 26, Issue 2, 2006, pp 121–131; online (paywall); preprint.

Theorem no. 158: The Albert–Brauer–Hasse–Noether Main Theorem
1. The restriction of the Main theorem to number fields is essential not every finite-dimensional division algebra is a cyclic algebra. A counter-example, due to A. Adrian Albert, is described on page 57 of Lewis's article (the weblink on the theorem page).
2. There is more on Käte Hey's contributions to modern algebra and number theory in Peter Roguette's "Class Field Theory in characteristic p. Its origin and development".

Theorem no. 159: The McIver–Neumann Half-n Bound
1. McIver and Neumann published this result in A. McIver and P. M. Neumann, "Enumerating finite groups", Quart. J. Math. (2) 38 (1987), 473–488. Not open access unfortunately; you can view the first page here and some more background is given via Peter Cameron's blog.

Theorem no. 160: The Classification of Archimedean 4-Polytopes
1. Tony Phillips' review (a 2.4MB download from here) of Tony Robbin's Shadows of Reality, Yale University Press, 2006, contains much fascinating material on 4-dimensional solids.
2. The Historia Mathematica article by Irene Polo-Blanco is chapter 5 of her excellent 2007 University of Groningen thesis Theory and History of Geometric Models which may be found online here.
3. If only Alicia Boole Stott could have tried the virtual reality game Hypernom!

Theorem no. 161: Quadratic Nonresidue is Zero-Knowledge Provable
1. There is at least one real-life illustration of zero-knowledge proofs in the field of nuclear disarmament. A follow up.
2. Jeremy Kun has a fine series of blog posts on zero-knowledge proofs. Follow from here.

Theorem no. 163: The Friendship Theorem
1. A useful little entry at The Futility Closet links to an elementary (purely graph-theoretic) proof of this theorem by Judith Longyear and Torrence Parsons.

Theorem no. 164: The Diaconis–Holmes–Montgomery Coin-Tossing Theorem
1. A full account of this theorem, including a wonderful account of experimental confirmation, is given here (5MB pdf file). My illustration of the theorem is partly based on figures 3 and 4 of this paper.

Theorem no. 166: Haken's Unknot Theorem
1. A good account of further developments on the complexity of determining unknottedness is given by Gil Kalai here and there is a wiki page on the subject.
2. Knot theory deals with embeddings of the 1-sphere $$S^1$$ (a closed curve) into 1 + 2 = 3 dimensions. More generally we can embed the n-sphere $$S^n$$into n + 2 dimensions. Thus $$S^2$$, a hollow sphere, is embedded into 4 dimensional space. And we can ask about the decision problem: is our embedding the 'unknot'? For $$n\geq 3$$ the answer is that the question is undecidable: this was proved in 1996 by Alexander Nabutovsky and Shmuel Weinberger. However, decidability in the case $$n=2$$ remains an open problem. A wonderful discussion of these issues is given here by Bjorn Poonen.

Theorem no. 167: The Lindemann–Weierstrass Theorem
1. The proof of the transcendence of $$e$$, Charles Hermite's breakthrough 1873 result, is very clearly described here.

Theorem no. 168: The Max-Flow Min-Cut Theorem
1. There is more on the history of this theorem at Whitty, R. W. "Some Comments on Multiple Discovery in Mathematics", Journal of Humanistic Mathematics, Volume 7 Issue 1(January 2017), pages 172-188. DOI: 10.5642/jhummath.201701.14 . Available at: http://scholarship.claremont.edu/jhm/vol7/iss1/14 (see top of page 179).

Theorem no. 170: Machin's Formula
1. Machin's Formula has an alternative statement viz  $$\tau/8=4\cot^{-1}5-\cot^{-1}239$$, thanks to the relationship between $$\cot^{-1} x$$ and $$\tan^{-1}(1/x)$$. This version is somewhat neater (although the inverse cotangent function is not directly available on calculators or in spreadsheets) and is preferred by some writers c.f. Pat's Blog.
2. Another entry at Pat's Blog offers an interesting synopsis of the history of Machin's series.

Theorem no. 171: The BEST Theorem
1. The BEST theorem finds an application in probability theory in an interesting contribution to exchangeability of random variables: Ivan Bardet, Cécilia Lancien, Ion Nechita, "de Finetti reductions for partially exchangeable probability distributions"; online preprint. The application has its origins in a paper from 1984 of Arif Zaman: "Urn models of Markov exchangeability", Ann. Probab., Vol. 12, No. 1 (1984), 223–229; online. Thanks to Ion Nechita for alerting me to this.

Theorem no. 172: The Dyson–Andrews–Garvan Crank
1. The fulfillment of Dyson's crank proposal was published as George E. Andrews and F. G. Garvan, "Dyson's crank of a partition", Bull. Amer. Math. Soc. (N.S.), Vol. 18, No. 2 (1988), 167–171; online.

Theorem no. 173: The Ramanujan Partition Congruences
1. There is a very nice account of the $$p=5$$ congruence ('Ramanujan's most beautiful identity') by Christian Krattenthaler in the June 2017 issue of the European Mathematical Society Newsletter. A direct link to the pdf file is here (2.8MB), with Krattenthaler's article beginning on p. 41 and the Ramanujan part beginning on p. 47.
2. A good discussion "Congruence properties of the partition function" by Tony Forbes is here (pdf, 0.5MB download, 100 pages long but the last 90 pages form an appendix listing computer-generated identities and can be ignored by most readers, I imagine).

Theorem no. 174: The Cameron–Fon-Der-Flaass IBIS Theorem
1. A superb tribute volume in Fon-Der-Flaass's memory has been edited by Elena Konstantinova: (20MB pdf, English and Russian)
2. The original weblink from this theorem page was a fine article "Quantifying symmetry" by Jonathan A. Cohen, The Australian Mathematical Society Gazette, Vol. 32, Number 2, May 2005. It offers a very good introduction to bases of permutation groups but doesn't mention IBIS groups which is why eventually I preferred to link to a talk, by Cameron, which does.

Theorem no. 175: The Bungers–Lehmer Theorem on Cyclotomic Coefficients
1. The famous proof of Wedderburn's Little Theorem by Ernst Witt is based on cyclotomic polynomials and is a great tangent to follow. See the recommended weblink on that page.
2. An exciting new development in the study of cyclotomic coefficients is the discovery by Gregg Musiker and Victor Reiner of a topolocial interpretation. See this, for example.

Theorem no. 177: The Heine–Borel Theorem
1. Nicole R. Andre, Susannah M. Engdahl and Adam E. Parker give a wonderful early history of this result in "An Analysis of the First Proofs of the Heine–Borel Theorem", Convergence, Vol. 9, 2012, online here.
2. The notion of compactness is given a very good intuitive treatment by Evelyn Lamb here.

Theorem no. 178: Sendov's Conjecture
1. The proof of Sendov for degree 8 is: Johnny E.Brown and Guangping Xiang, "Proof of The Sendov Conjecture for Polynomials of Degree at Most Eight", Journal of Mathematical Analysis and Applications, Vol. 232, Issue 2, 1999, 272–292; online.
2. Jérôme Dégot's proof of Sendov for high degree appeared in 2014 as "Sendov conjecture for high degree polynomials", Proc. AMS, vol. 142 (2014), 1337–1349. It is pay-to-view online but a preprint is here.
3. There is a nice snapshot of Dégot's result at about p. 100 of this cornucopia (1.1MB pdf) by Pamela Gorkin.
4. A paper by Zaizhao Meng on the arxiv claims a proof of Sendov for polynomials of degree 9. Also this and this claiming to prove the conjecture outright.
5. Progress has been made by Robert Dalmasso for the case where the zeros of the polynomial are simple.

Theorem no. 180: The Greibach Normal Form Theorem
1. The algorithm converting $$G$$ to Greibach Normal Form on $$O(|G^4|)$$ symbols is given in Norbert Blum and Robert Koch, "Greibach Normal Form transformation revisited", Information and Computation, Vol. 150, Issue 1,1999, pages 112–118; online.
2. Prof. Greibach was kind enough to send me a few comments on this theorem which I quote below:
 Although my original definition of GNF is as you describe, in my class notes I now permit S -> emptystring if S does not appear on the right hand side of any production and similarly for CNF (Chomsky Normal Form) as in the notes you attach, so that all context-free languages are covered. As far as I know, GNF is not used in any grammar-pda transformations directly; it is essential to conversion to a pda without epsilon-rules, i.e. to a nondeterministic pda which must read a new input each unit of time (I usually call this quasi-realtime). Indeed, the fact that GNF suffices for context-free languages is equivalent to the fact that nondeterministic pda are equivalent in power to quasi-realtime pda. My normal form was proven in 1962 and appears in my 1963 thesis but, as you note, the first full publication was in 1965.

Theorem no. 181: Archimedes’ Equiareal Map Theorem
1. Bradley Carroll has a very nice series of pages on Archimedes' achievements, where we find (see this page) that Archimedes himself regarded his results on areas and volumes of curved bodies to be his finest work. The sphere-cylinder surface area ratio is quoted there as 2/3 whereas our version says the surface areas are equal; this is merely because our cylinder has no top or bottom. To be specific, the ratio for a sphere of radius $$r$$ and a cyclinder of radius $$r$$ and height $$h$$, is $$2\tau r^2/(\tau r h+\tau r^2)$$; we have $$h=2r=2$$ and omit the second term in the denominator.
2. I could not resist invoking E.T. Bell's celebrated triumvirate in connection with this theorem. That is shameless popularism, though, since I entirely subscribe to Thony Christie's injunction "Context is everything".
3. The attribution to Gauss-via-Newton of differential geometry is even more shameless. In Dirk Jan Struik's classic two-part "Outline of a History of Differential Geometry", the whole of part 1 is pre-Gauss (Isis, vol. 19, no. 1, 1933, pp. 92–120) with the key modern players being Clairaut, Euler and Monge. However, Gauss's work in the 1820's pervades a large proportion of part 2 (Isis, vol. 20, no. 1, 1933, pp. 161–191), and the first and second fundamental forms belong to intrinsic differential geometry which was intrinsically Gauss.

Theorem no. 182: The Girard–Newton Identities
1. A short survey of elementary proofs of these identities, together with an elegant new proof using matrix algebra, is given in Dan Kalman, "A Matrix Proof of Newton's Identities", Mathematics Magazine, Volume 73, Number 4, October 2000, pp 313 - 315 (online here, dated 8/16/99).
2. The photo of Peter Cameron is a cropped version of one I found on his 60th birthday conference website, maintained by Robert Bailey. It is attributed to Adrian Bondy by Peter Cameron himself. He told me in an email that "the picture was taken by Adrian Bondy ... at the Victoria Arms in Oxford (at Dominic Welsh's retirement conference in 2005 (I think)." (From a follow-up email from Bondy "I don't recall having taken the photo, but it's possible." Since Bondy's photography is art (see his gallery website) I take the issue seriously!

Theorem no. 183: Theorema Egregium
1. An excellent and beautifully illustrated technical source on this theorem is Nigel Hitchin's notes on Geometry of Surfaces, under Teaching here.

Theorem no. 184: von Neumann's Minimax Theorem
1. A superb analysis of the origins of von Neumann's theorem is Tinne Hoff Kjeldsen, "John von Neumann’s Conception of the Minimax Theorem: A Journey Through Different Mathematical Contexts", Arch. Hist. Exact Sci. 56 (2001) 39–68. There is reprint online here.

Theorem no. 186: The Insolvability of the Entscheidungsproblem
1. Some contextual information is given in a presentation (500KB pdf) I gave at Rewley House on 23 June 2012.
2. An good source of undecidable problems is this 2012 survey (450KB pdf) by Bjorn Poonen. Related links on Diophantine undecidability are: James P. Jones paper "Diophantine Representation of the Fibonacci Numbers" (1.3MB pdf) , and this paper by Yuri Matiasevich (in which some of the characters seems to print strangely but not unreadably).
3. Related theorems are: The DPRM Theorem, Haken's Unknot Theorem, Reidemeister's Theorem, and Gödel's First and Second Incompleteness Theorems.
4. Details of Jack Copeland's Essential Turing are given here; Charles Petzold's reading guide to Turing's 1936 paper is listed here. Biographies of Turing are listed here.
5. N.B. A much more extensive list of Turing-related material came out of an OUSSA summer school I gave at Oxford in July 2012.
6. Hilbert's 1930 "Wir müssen wissen, Wir werden wissen" radio address is online here with a transcription and an accompanying English translation.
7. There is a poetry version of the proof of non-decidability of the Halting problem here by Geoffrey K. Pullum, as I learnt from Pat'sBlog.
8. At the heart of Turing's result is the demonstration that not all functions from the natural numbers to the natural numbers can be computed. Joel David Hamkins has the intriguing result that any function is computable if the right model of arithmetic is chosen. This means arithmetics which satisfy the axioms of the natural numbers but which contain additional 'non-standard' numbers. A good introduction is provided by John Baez.
9. This blog post from Gödel's Lost Letter is an excellent source on proving the unsolvability of the Halting Problem.

Theorem no. 187: Karp's Theorem
1. Karp's original article is R.M. Karp, "Reducibility among combinatorial problems", in Complexity of Computer Computations (R.E. Miller and J.W. Thatcher, eds.), Plenum Press, 1972, pp. 85–103. It is reprinted with a nice introduction by Richard Karp in Michael Jünger, Thomas M. Liebling, Denis Naddef, George L. Nemhauser, William R. Pulleyblank, Gerhard Reinelt, Giovanni Rinaldi and Laurence A. Wolsey (eds.), 50 Years of Integer Programming 1958-2008: From the Early Years to the State-of-the-Art, Springer, 2010. There is a copy of the chapter online here (scroll down to Lecture 19)

Theorem no. 188: al-Kāshi's Law of Cosines
1. Garry J. Tee kindly provided the following amplification on the history of trigonometric functions: "Hipparchus in (c-130) invented the chord, the first trigonometric function, and he constructed a short table of values of the chord function.  (On a sphere of radius R, the chord of angle x is the distance between 2 points on the sphere subtending angle x at the centre). By the 5th century, Hindu astronomers had replaced the chord  by the more convenient sine function, with $$2R\sin x = \mbox{chord}(2x)$$. In 499, Aryabhata commenced his renowned astronomical treatise “Aryabhatiya” with a short table of sines."

Theorem no. 189: The Handshaking Lemma
1. Architectural historian Dr Lynn Pearson kindly sent me the following comments in response to a query regarding the origins and attribution of the tiling pattern illustrating this theorem:
Dr Pearson also provided a link to a relevant article of hers, although for copyright reasons its images cannot be displayed. And there is a little more in the Tile Gazetteer.
2. From plus magazine: applying the double counting argument proof of the Handshaking Lemma to the complete graph on n + 1 vertices is a neat way of proving that $$1+2+\ldots + n = \frac12n(n+1).$$
3. Other impressive applications of the Handshaking Lemma: the proof of Sperner's Lemma in 2D; and this proof that the so-called 'Lights-Out' game is solvable for any graph in which all vertices are initially 'turned on'.

Theorem no. 190: Jackson's Theorem on Compatible Euler Tours
1. The original weblink from this theorem page was to this fine set of notes (850KB pdf) by Tero Harju. Still recommended, of course, but the replacement link to the Egerváry Research Group page is more directly relevant.

Theorem no. 191: L'Hospital's Rule
1. A nice 'double' example of L'Hospital in action is the proof that $$\ln(x)\tan(x) \rightarrow 0$$ as $$x\rightarrow 0$$, starting in the $$\infty/\infty$$ form as $$\ln(x)/\cot(x)$$ and then transferring to the $$0/0$$ form (twice).
2. The expression $$0^0$$ gets a thorough investigation by Michael Huber and V. Frederick Rickey in "What is $$0^0$$", Convergence, Vol. 5, 2008. Online here. Other good treatments are this blog post by David A. Tanzer (with over 50 very informative reader responses) and this from askamathematician (which has over 1000 responses, which I haven't read but I suppose there must be some interesting stuff there as well!)
3. A nice short account of the Bernoulli vs L'Hospital ownership of this theorem is given here at Life Through a Mathematician's Eyes.

Theorem no. 193: Frieze's Theorem on Expected Minimum Tree Length
1. Original source: Alan M. Frieze, "On the value of a random minimum spanning tree problem", Discrete Applied Mathematics, 10 (1985), 47–56; online.
2. A sharper asymptotic for Frieze's result is given in Colin Cooper, Alan Frieze, Nate Ince, Svante Janson, Joel Spencer, "On the length of a random minimum spanning tree", Combinator. Probab. Comp., 25 (2015) 89–107; online (paywalled), open-access preprint.
3. Find more wonderful properties of $$\zeta(3)$$ in this preprint by David Broadhurst.

Theorem no. 194: Wilson's Theorem
1. The if' converse to the theorem follows because if $$n$$ is composite then some $$d$$,$$1<d<n$$, divides $$n$$. Then $$d$$ appears as a factor of $$(n-1)!$$ and therefore cannot divide $$(n-1)!+1$$, in which case, neither can $$n$$.
2. Fredrik Johansson describes a neat trick for reducing the computation required for testing primality via Wilson's Theorem.
3. A combinatorial argument is described here which seems in the same spirit as P.G. Anderson et al's combinatorial lemma.
4. It seems convenient to record here Gauss's generalisation of Wilson's Theorem: if $$n>2$$ then $$\prod_{\stackrel{k=1}{\gcd(k,n)=1}}^n\hspace{-.2in}k = \left\{\begin{array}{rcl} -1\mbox{ mod }n & & n=4, p^m, 2p^m,\\ 1\mbox{ mod } n &&\mbox{otherwise.}\end{array}\right.$$ A very nice proof is given here (under "Expository Papers").

Theorem no. 195: The Erdős–Ko–Rado Theorem
1. The Katona proof was published in Katona, G.O.H., "A simple proof of the Erdős–Chao Ko–Rado theorem", Journal of Combinatorial Theory, Series B, Vol. 13, Issue 2, October 1972, Pages 183–184; online.
2. An interesting discussion by John Mount of proofs of Erdős–Ko–Rado can be found in this Win-Vector blog entry.
3. The requirement that $$n\geq 2k$$ is necessary since when $$k>n/2$$ any pair of subsets must necessarily intersect. This makes $$n/2$$ a transition point where the size of a maximum intersecting family jumps up, as illustrated below for $$n=50$$. The horizontal axis is $$k$$; the red dots are values of $${n-1 \choose k-1}$$; the blue dots are values of $${n \choose k}$$.

Theorem no. 196: The Cantor–Bernstein–Schröder Theorem
1. There is a more here the history of this theorem.
2. The proof presented for this theorem is streamlined by appealing to the Knaster-Tarski fixed-point theorem which guarantees a fixed point for a monotone function on a complete lattice (in this case the power set lattice). More details at this MathWorld entry.
3. CBS is, in the analysis of Williard Quine, one form of the 'law of comparability. In Set Theory and Its Logic, Harvard University Press, revised edition,1969, p. 208, he memorably says, "Accidents of definition aside, there are three distinct things here: the Axiom of Choice, the Schröder-Bernstein Theorem, and triviality."
4. A graph-theoretic proof of this theorem generalises to one about paths, as described in Reinhard Diestel & Carsten Thomassen, "A Cantor-Bernstein theorem for paths in graphs", American Mathematical Monthly, February 2006, online here.

Theorem no. 197: The Robin–Lagarias Theorem
1. Lagarias's paper was published as Jeffrey C. Lagarias, "An elementary problem equivalent to the Riemann Hypothesis", Amer. Math. Monthly, 109 (2002), 534–543. The recommended weblink on the theorem page is the arxiv preprint.
2. A collection of assertions equivalent to the Riemann Hypothesis is given here.
3. A nice Quora entry by Alan Amit discusses the implications of RH being false. In passing he points out that Lagarias's version of RH is elementary enough that a counterexample to RH would mean a disproof in Peano arithmetic, not something that can be easily deduced from finding a zero off the critical line.
4. Out of curiosity I plotted the divisor function $$\sigma(n)$$ (red) against the RHS of Laragias's inequality $$H_n+\ln(H_n)\exp(H_n)$$ for the first $$10^5$$ values of $$n$$. I it hard to imagine betting against RH!

Theorem no. 198: The Art Gallery Theorem
1. There is a good account of the Art Gallery Theorem by Erica Klarreich for Quanta Magazine here.

Theorem no. 199: Fermat's Two-Squares Theorem
1. The representation of a prime $$p=4n+1$$ as a sum of two squares is unique (upto order of summands). A nice proof using Gaussian primes, is given here.
2. Strictly speaking, Lagrange's Lemma only goes one way: if $$p$$ is congruent to 1 mod 4 then $$-1$$ is a quadratic residue mod $$p$$. Lagrange used his lemma in 1773 to give a simpler proof than Euler's of Fermat's theorem. There is a short discussion in chapter 6 of Stillwell's Elements of Number Theory, Springer 2003 (see sections 6.5 and 6.8 and chapter 9 for the converse). The lemma is an instance of the Law of Quadratic Reciprocity (see this, for example).

Theorem no. 200: Minkowski's Convex Body Theorem
1. Our proof of Fermat's 2-Squares theorem (theorem no. 199) uses Minkowski's theorem (i.e. is a theorem in his 'geometry of numbers'). Some good notes (125KB pdf) by Pete L. Clark, prove Lagrange's 4-squares theorem in a similar manner, as well as giving a thorough background on Minkowski's theorem.

Theorem no. 201: Jensen's Inequality
1. This theorem's illustration originally featured soup cans based on Andy Warhol's Campbell's Soup pop art. However, Campbell Soup Company declined to give me permission for this use "in part because the image you have used is not a reproduction of our famous trademarks, but rather what we consider a 'mutilation' of our marks." If you would like to view this mutilation for yourself let me know and I will smuggle you a copy!
2. A good real-life application from the world of finance is given here, taken from Sam L. Savage, The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty, John Wiley & Sons, paperback edition, 2012.

Theorem no. 202: The Freidlander–Iwaniec Theorem
1. The web link for this theorem is to an overview of the its proof. The actual proof is set out in a subsequent paper of almost a hundred pages: John Friedlander and Henryk Iwaniec, "The polynomial X2+Y4 captures its primes", Annals of Mathematics, Vol. 148, no. 3, 1998, pp 945–1040; online.
2. The question of whether there are infinitely many primes of the form $$n^2+1$$ is the first of Landau's Problems, all unsolved as of January 2018.

Theorem no. 203: Euler's Continued Fraction Correspondence
1. Tony Foster gives a continued fraction in terms of cubes for $$\pi=\tau/2$$ via a nice exploitation of Nilakantha's series in the same vein as the derivation by Douglas Bowman. Suggests a nice exercise to give the corresponding result for $$\tau$$. He also has a similar derivation for the golden ratio to contrast to the simple continued fraction (all 1's) which everyone knows.

Theorem no. 204: Singmaster's Binomial Multiplicity Bound
1. This is a 'theorem under construction': I hope to chart exciting developments here towards an eventual final version, which may or may not be $$N(k)=8$$ for all $$k$$.
2. There is a nice entry on Singmaster's conjecture at Gödel's Lost Letter.
3. A version of Singmaster's conjecture in terms of algebraic geometry is given in Hugo Jenkins, "Repeated binomial coefficients and high-degree curves", Integers, vol. 16, 2016.

Theorem no. 205: The Classification of the Semiregular Tilings
1. There is more on Kepler's investigation into plane tilings in this lovely article by plus magazine. In fact they have a whole collection of tiling articles!
2. The general subject of plane tilings is deep and wide: for example, the decidability of whether a given single tile will tile the plane is an open question (very elegantly introduced by Chaim Goodman-Strauss in "Can't Decide? Undecide!", Notices of the AMS, Vol. 57, No. 3, 2010, 343–356, online here).

Theorem no. 206: Euler's Formula
1. A sweet, and sweetly presented, proof of Euler's Formula, given by Math With Bad Drawings, makes a very nice example of solving differential equations by separation of variables.
2. The appearance of a protractor (angle measurer) in the illustration of this theorem gives me an excuse to link to this delightful little History of Protractors from Life Through a Mathematician's Eyes.

Theorem no. 207: The Eratosthenes-Legendre Sieve
1. A very nice motivation for sieving in number theory is given by Terence Tao here.

Theorem no. 208: Torricelli's Trumpet
1. In response to my query, Paolo Mancosu kindly gave me following comments on the origins of Torricelli's result and his methods:
 "I am quite sure [that] Oresme and Fermat and Roberval certainly did not anticipate Torricelli's discovery. Fermat wrote about similar solids (de infinitis hyperbolis) after Torricelli. As for Roberval, I cite that report of Mersenne (given by Torricelli) where it is reported that, according to Mersenne, Roberval had written some kind of speech claiming that Torricelli's result was impossible! Had he anticipated him, he would certainly have claimed priority rather than trying to prove that the result was impossible. You are right that Torricelli does not prove that the lateral surface is infinite. I do not know who first did that."
2. In his review (Notre Dame J. Formal Logic, vol. 40, no. 3, 1999, 447–454) of Mancosu's Philosophy of Mathematics & Mathematical Practice in the Seventeenth Century, Craig Fraser says (p. 448)  "Torricelli discovered the remarkable fact that the solid of revolution obtained by rotating the hyperbola $$y=1/x$$ about the $$x$$-axis has finite volume and infinite surface area." I have found no other evidence that Torricelli calculated the surface area of his solid, however.
3. Dave Richeson's well-known Division by Zero blog has provided a template for constructing a paper version ofTorricelli's Trumpet (under the pseudonym Gabriel's Horn). (Possibly the arxiv version of Richeson's paper is more of a permalink than the one given in the blog entry.)
4. A very nice animated illustration of indivisibles, applied to circular area, is given by Matt Henderson here.

Theorem no. 209: The Erdős Discrepancy Problem
1. This was originally posted as a 'theorem under construction': September 2015 brought news of Terence Tao's complete resolution of the conjecture. This made it the 2nd 'declassification', after Kepler's Conjecture. The role of the Polymath in Tao's proof has been commented on helpfully by Gowers here.
2. There is much more to the background to the conjecture, and approaches to it, at the official Polymath 5 page.
3. The sequence of length 1160 appearing in the table in this theorem description, reproduced from Konev and Lisitsa's paper arxiv.org/abs/1402.2184, is available in Excel 2003 here. The first 11 terms:
−  +  +  −  +  −  −  +  +  −  +
happen to constitute a maximum-length sequence with discrepancy C=1. The terms sum to +1 so continuing with a +1 would give a summation to 2; the even-index terms sum to −1, so continuing with a − would give a summation to −2. The sequence 0,11,1160, ... is oeis.org/A237695; it is known (Konev and Lisitsa) that the next term exceeds 130000.
4. There is a very good description of Konev and Lisitsa's proof of $$C=2$$ by Richard Lipton and Ken Regan here.
5. Erdős mentions Nikolai Chudakov (spelt 'Tchudakoff') in connection with this conjecture here (in Problem 49).
6. Although Mathias' paper was published in a 1997 tribute volume for Erdős, this was actually the proceedings of Erdős' 80th birthday celebration, held in March 1993.
7. Terence Tao has a valuable blog entry on 'near-counterexamples' to the conjecture and its unexpected relationship to the 'Elliott Conjecture'. (But see note 1!)

Theorem no. 210: The Basel Problem
1. The proof given in the description of this theorem is called the 'Lewin argument' by Kalman and McKinzie who cite its first known appearance as Leonard Lewin's Polylogarithms and Associated Functions, Elsevier Science, 1981 (although they stress that Lewin did not claim credit). The book is an update of Lewin's earlier Dilogarithms and Associated Functions, Macdonald, 1958, in which the same material may be found (chapter 1, section 3.1). Both books are out of print, sadly. There is a valuable review of the former by Richard Askey in Bull. AMS, vol. 6, no. 2.
2. Regarding Euler's solution to the Basel Problem, the accepted sequence of events appears to be: discovered in 1734, presented in 1735, published in 1740. The Euler Archive gives the date of presentation of Euler's paper as December 5,1735, but indexes it with the 1734 presentations. It might appear that 'December 5, 1734' is the correct date. In "Euler and the Zeta Function" Raymond Ayoub gives 1934 as the year of "Euler's first triumph" and says "Euler communicated his result to Daniel Bernoulli and, while unfortunately this letter has been lost, the reply does exist". However, the reply is dated in the Euler Archive as 12th September 1736. Like many of Bernoulli's letters to Euler it deals with several matters giving no clue as to when they arose but it would seem more consistent with a 1735 letter from Euler than a 1734 one.

Theorem no. 211: Willans' Formula
1. The source for this page is C. P. Willans, "On formulae for the nth prime number", The Mathematical Gazette, Vol. 48, No. 366 (Dec., 1964), pp. 413–415. Not free to view online but 1st page can be previewed here.
2. The reduction, for composite $$k=a\times b$$, of $$\sin^2(1+(k-1)!)\tau/4k)$$ to $$\sin^2(\tau/4k)$$ requires justification. Indeed $$1+(k-1)!\equiv 1\!\!\!\mod k$$ because both $$a$$ and $$b$$ will divide $$(k-1)!$$. And the quotient of $$(k-1)!)/k$$ will be a multiple of 4 when there are sufficient even factors in $$(k-1)!$$, which is when $$(k-1)/2>2+\log_2 k$$. This occurs for $$k >12$$ (but $$k = 6, 9,10,12$$ all reduce to $$\tau/4k$$ since the greatest power of 2 in their factorisations is low).
3. See note 1 for Theorem 194 regarding the computation required to implement Willans' formula.
4. The famous 26-variable polynomial of Jones–Sato–Wada–Wiens whose positive values, over the positive integers, are precisely the prime numbers, is also based on Wilson's Theorem. There is a nice explanation here.

Theorem no. 212: Vizing's Theorem
1. Details regarding Gupta's discovery of this theorem are supplied in the preface of Michael Stiebitz, Diego Scheide, Bjarne Toft and Lene M. Favrholdt, Graph Edge Coloring: Vizing's Theorem and Goldberg's Conjecture, Wiley-Blackwell, 2012:
 "Vizing's bound was discovered independently by Ram Prakash Gupta during his Ph.D. studies, mostly at the Tata Institute of Fundamental Research in Bombay, 1965–1967, supervised by Sharadchandra Shankar Shrikhande, and stimulated by Claude Berge. Also Gupta's proof was based on a variation of the fan idea (discovered independently by Gupta), and it was extended to locally bounded infinite graphs i.e. infinite graphs with a finite maximum degree."
(Curiously, Gupta's rather nominal Wikipedia entry says his supervisor was C.R. Rao, at the Indian Statistical Institute, Calcutta, and their Math Geneology entries support this. However, Rao is a statistician while Shrikhande is a combinatorialist. I suspect there are two R.P. Guptas.)
2. The 'fan idea' is the basis of textbook proofs of Vizing's theorem (but not the proof from Schrijver which I have chosen to link to) and extends to prove the generalisation to graphs with multiple edges: $$X'(G)\leq \Delta +m$$, where $$m$$ is the maximum edge multiplicity of $$G$$. A good account is here.

Theorem no. 213: The 6-Circles Theorem
1. This theorem is sometimes referred to as the Money-Coutts Theorem although it is not clear why Money-Coutts deserves more credit than Evelyn or Tyrrell. The name 'Six Circles' is a bit ambiguous: cut-the-knot, for instance, has three quite distinct theorems which qualify for the name (located here, alphabetically).
2. Although Tyrrell has been described as a 'professional' and Evelyn and Money-Coutts as 'amateurs', an obituary of Evelyn (by Tyrrell) appears in Bull. London Math. Soc., vol. 9, no. 3, 1977. He published professional-level work during the 1930s and then again in the 1960s.
3. A generalisation by Serge Tabachnikov in a different direction from that discussed in this theorem description, namely from triangles to n-gons, is given in "Going in Circles: Variations on the Money-Coutts Theorem", Geometriae Dedicata, 80, 2000, 201–209, online here.

Theorem no. 214: A Theorem on Maximal Sum-Free Sets in Groups
1. Extremal problems concerning sum-free sets in (abelian) groups are the subject of a blog entry by Terence Tao.

Theorem no. 215: Wedderburn's Little Theorem
1. Multiplication in the quaternions is described in the description of Moufang's Theorem; you can check that the given multiplication table for the Dickson near-field of order 9 is identical to quaternion multiplication under the isomorphism: $$\left(\begin{array}{rrrrrrrr} 1 & a & b & c & d & e & f & g \\ 1 & -1 & i & j & -k & -i & k & -j \end{array}\right)$$ whereby the multiplication is seen to be almost commutative in the sense that the table is skew symmetric.
2. Zinovy Reichstein has drawn my attention to an uncomfortable but unavoidable footnote: an elegant, one-page, group-theoretic proof of Wedderburn's Little Theorem was published by the so-called Unabomber, Ted Kaczynski, while a PhD student at the University of Michigan. A reference can be found here.
3. One-page proofs continue to appear. E.g. John Schue, "The Wedderburn Theorem of Finite Division Rings", Amer. Math. Monthly, Vol. 95, No. 5 (May, 1988), pp. 436-437 (using properties of field extensions); Nicolas Lichiardopol,"A New Proof of Wedderburn's Theorem", Amer. Math. Monthly, Vol. 110, No. 8 (Oct., 2003), pp. 736-737 (ring theory, exploiting, like Kaczynski's proof, an initial lemma from number theory).
4. Regarding the original discovery and proof of Wedderburn's theorem, Karen Parshall is the authority: “In Search of the Finite Division Algebra Theorem and Beyond: Joseph H. M. Wedderburn, Leonard E. Dickson, and Oswald Veblen”, Archives Internationales d’Histoires des Sciences, vol. 35 (1983), pages 274–299. There is a nice exploration of one aspect by Michael Adam and Birte Julia Mutschler: "On Wedderburn's theorem about finite division algebras"; paper 99 here.
5. It long remained an intriguing circumstance that Wedderburn's theorem gave an algebraic proof that Desargue's theorem imples Pappus's for finite projective planes, and that no geometric proof was known (see, e.g., Peter Cameron's Projective and Polar Spaces, chapter 2, page 23). John Bamberg and Tim Penttila have resolved the issue by providing a geometric proof of Wedderburn, "Completing Segre's proof of Wedderburn's little theorem", Bull. Lond. Math. Soc., vol. 47, no. 3, 2015, pp. 483–492; preprint. (Additionally, the paper is an excellent source on Wedderburn's theorem generally.)

Theorem no. 216: Irrationality of Circumference of Unit Circle
1. A more detailed account of Lambert's irrationality proof is given at here at math.stackexchange.
2. Featured in Math Scholar's thread Simple proofs of great theorems.

Theorem no. 217: Taylor's Theorem
1. An annotated English translation of Brook Taylor's Methodus Incrementorum Directa & Inversa can be found here at Ian Bruce's invaluable 17centurymaths.com.
2. An alternative justification for Hugh Worthington's Rule is given by Colin Beveridge here. The explanation illustrating theorem no. 217 is by Tony Forbes, M500 magazine, issue 260, 2014, p. 17. He observes that using degrees instead of radians allows an even better approximation: $$\displaystyle \tan^{-1}\frac{a}{b}\approx \frac{172a}{b+2c}$$ where $$c=\sqrt{a^2+b^2}$$.
3. A step-by-step proof of the Lagrange remainder form of Taylor's theorem is given by Gowers here.

Theorem no. 218: The Riemann Rearrangement Theorem
1. Riemann's habilitation thesis "Ueber die Darstellbarkeit einer Function durch eine trigonometrische Reihe" was published posthumously in 1867. A facsimile can be found here and the text is transcribed here. An English translation is available although it may be out of print. Riemann's habilitation work is discussed in detail in Detlef Laugwitz (transl. Abe Shenitzer), Bernhard Riemann 1826–1866: Turning Points in the Conception of Mathematics, Birkhauser, 2nd printing, 2008. A French translation is here (§1–§8) and here (§9–§13) (presumably the one by Darboux and Houel, c.f. these notes, although no credit is given).
2. It seems worthwhile to give an English translation of Riemann's proof of his rearrangement theorem (from §3 of his thesis):
 "In Crelle’s Journal in January 1829 a memoir by Dirichlet appeared in which rigorous conditions were established for representing, by trigonometric series, functions which are integrable and which do not possess infinitely many maxima or minima.         "He discovered the correct path to follow to solve this problem by consideration of the fact that infinite series fall into two classes according to whether or not they remain convergent when all their terms are made positive. In the first class, the terms may be permuted in an arbitrary manner; whereas in the second class, the value of the series depends on the ordering of the terms. Indeed, if one denotes, in a series of the second class, the positive terms by $$a_1,a_2,a_3,\ldots,$$ and the negative terms by $$-b_1,-b_2,-b_3,\ldots,$$ it is clear that $$\sum a$$, and similarly $$\sum b$$, must be infinite; for if both sums were finite then the series would still be convergent on giving all terms the same sign; if just one of the sums where infinite, then the series would diverge. It is now clear that the series, if its terms are placed in a suitable order, may take an arbitrary given value $$C$$; for if one takes alternately the positive terms of the series until its value exceeds $$C$$, and then the negative terms until the value falls below $$C$$, the difference between this value and $$C$$ will never exceed the value of the term immediately preceeding the most recent change of sign. Now the $$a$$ values, and similarly the $$b$$ values must eventually become infinitesimally small as their indices increase, and thus the differences between the series sum and $$C$$ must also become infinitesimally small, as one extends the series sufficiently long, which is to say that the series converges to $$C$$.         "It is only series of the first class which are amenable to the laws governing finite sums; only they may be considered as the collection of their terms; those of the second class may not be so considered: a circumstance which was missed by mathematicians of the last century, in the main because series which extend according to ascending powers of a variable belong, generally speaking (which is to say, with the exception of certain exceptional values of that variable), to the first class. "
3. Regarding the rearrangements of Leibniz's series given in figure A of the theorem description, it is remarkable that closed forms may be given to their sums (allowing for special functions). Thus for the highest valued rearrangement shown (approx. 0.95868) we have (thanks to Maple): $$\sum_{k=0}^{\infty}\left(\frac{1}{8k+1}+\frac{1}{8k+5}-\frac{1}{4k+3}\right)=-\frac14\gamma-\frac34\ln 2+\frac{1}{16}\tau-\frac18\Psi\left(\frac18\right)-\frac18\Psi\left(\frac58\right),$$ where $$\gamma$$ is the Euler-Mascheroni constant, $$\tau$$ is circumference of unit circle, and $$\Psi$$ is the digamma function (the slope of the log of the gamma function).
4. The alternating harmonic series provides an even more fascinating example of Riemann's theorem in the hands of Larry Riddle in this article which originally appeared in Kenyon Mathematics Quarterly, vol. 1, no. 2 (1990), 6–21.
5. In another version, Riemann's theorem tells us that a series is absolutely convergent if and only if every rearrangement converges. This becomes a test for absolute convergence, in principle, if it can be shown that a finite number of convergent rearrangements is enough. This proposed 'rearragement number' is the object of study by Andreas Blass, Will Brian, Joel David Hamkins, Michael Hardy and Paul Larson.

Theorem no. 219: Integration by Parts
1. A very attractive discussion about "striking applications of integration by parts" is ongoing at stackexchange.
2. Ian Bruce wrote to me of his experience with his valuable project 17centurymaths: "Most of the elementary calculus material can be found in Euler's Differential and Integral Calculus books, and in fact he starts Book I on Integration with integration by parts; Ch.1 of this book is Top of the Pops in my line of business, and gets first place consistently in downloads, followed by Newton's definitions & Axioms ... Euler's work is still highly readable, and more so than others of that age and before; in fact he seems to have set the standard for generations of mathematicians to come."
3. Ernst Hairer has provided an elegant geometrical interpretation of integration by parts, which may be viewed here.
4. There is a nice description here by Murray Bourne of an alternative to integration by parts called the Tanzalin Method which is apparently commonly used in Indonesia. As you will see, it too can lead to infinite series!

Theorem no. 220: The Pappus–Guldin Theorems
1. According to Andrew Leahy's article, no proof by Pappus of his theorems has been discovered and Paul Guldin gave no proof, the first known proof being supplied by Giannantonio Rocca in 1644.
2. There is a very nice discussion of the 17th century pre-calculus debate in Chapter 5 of Amir Alexander's Infinitesimal: How a Dangerous Mathematical Theory Shaped the Modern World, Oneworld Publications, 2014, with a generous extract here.
3. Peter Harremoës has drawn my attention to a little irony: the surface area of the torus is usually given in terms of its major radius $$R$$ and minor radius $$r$$, as $$\tau^2rR$$. You can instead use inner radius $$a=R-r$$ and outer radius $$A=R+r$$ and in this case surface area is given as $$\pi^2(A^2-a^2)$$. Rather sneakily it is the latter, less standard, presentation which is used about 2.5 mins into this film debate on the π vs τ question as an argument that pi makes things simpler!

Theorem no. 221: The Inclusion-Exclusion Principle
1. Attributions of Inclusion-Exclusion often include the name of Poincaré (e.g. 'formule du crible de Poincaré') and this seems a bit obscure. In Encyclopaedia of Mathematics, Supplement III: 3 (ed. Michiel Hazewinkel, Kluwer, 2002) this attribution carries a reference to Poincaré's book Calcul des probabilités, Gauthier-Villars, 1896, and this book may have been influential in making the principle widely known in France.
2. Inclusion-Exclusion may be generalised in several ways. A good example is given here by Stewart Weiss; probably the most famous is due to Gian-Carlo Rota and is described by Peter Cameron in Lecture 9 of this course. Rota's original article can (and should!) be read here.
3. Stewart N. Ethier has contributed the following: "the expected number of boxes needed for a full set of coupons has the nice formula $$\displaystyle n\!\left(1+\frac12+\frac13+\ldots+\frac{1}{n}\right)$$, which either can be derived from [the inclusion-exclusion formula] (via Theorem 1.4.2 of my book [The Doctrine of Chances: Probabilistic Aspects of Gambling]), or can be derived directly by writing the random variable of interest as the sum of $$n$$ independent geometric random variables with success probabilities $$1, (n-1)/n, (n-2)/n, \ldots, 1/n$$ (and using the fact that a geometric($$p$$) random variable has mean $$1/p$$)."
4. A nice application of Inclusion-Exclusion is to vary the standard combinatorial question "How many ways to put n indistinguishable balls into k distinguished boxes" by adding "so that no box gets more than C balls". An excellent explanation of the answer is given here by Brian M. Scott.

Theorem no. 222: Faulhaber's Formula
1. For a general survey of properties of Bernoulli numbers, Pascal Sabah and Xavier Gourdon's article here is excellent.
2. Knuth has a fascinating reflection here on what Faulhaber achieved and how he achieved it.
3. Seki's discovery of the Bernoulli numbers is described in Silke Wimmer-Zagier and Don Zagier's chapter in Eberhard Knobloch, Hikosaburo Komatsu and Dun Liu (eds.), Seki, Founder of Modern Mathematics in Japan: A Commemoration on His Tercentenary, Springer, 2013. It may be found online here.

Theorem no. 223: Tutte's Golden Identity
1. There is a description of Tutte's work on graph polynomials in the excellent obituary by Arthur Hobbs and James Oxley which appeared in Notices Amer. Math. Soc. 51 (2004), 320-330.

Theorem no. 224: Green's Theorem
1. The original source for this theorem is "An Essay on the Application of mathematical Analysis to the theories of Electricity and Magnetism" which Ralf Stephan has transcribed here. It was published by Green at his own expense but received little attention until William Thomson (later Lord Kelvin) rediscovered it and arranged for its publication in Crelle's Journal in the 1840s.
2. Paul Nahin, whose is the recommended further reading for this theorem, also writes interestingly about its background in Chapter 7 of An Imaginary Tale: The Story of $$\small\underline{\sqrt{-1}}$$, Princeton University Press, 1998.
3. An important exhibition commemorating Green was held at the University of Nottingham in the autumn of 2014 and the curator's blog post is very interesting. Much of historical as well as scientific interest can be found in Lawrie Challis and Fred Sheard, "The Green of Green Functions", Physics Today, 56, 12, 2003, 41–46; a reprint here.

Theorem no. 225: The Spherical Law of Cosines
1. You can find latitudes and longitudes of cities, and compute great-circle distances between them here.
2. Pat Ballew has drawn my attention to a dual version of this theorem, relating three angles A, B, C and one side, say c: $$\cos(C)=-\cos(A)\cos(B)+\sin(A)\sin(B)\cos(c),$$ (this is referred to by Van Brumellen in Heavenly Mathematics as the "Law of Cosines for Angles").
3. A nmemonic of Napier for spherical trigonometry (also from Van Brumellen's book, I think) has been nicely summarised by John D. Cook here.
4. plus magazine have provided a very nice introduction to longitude and latitude.

Theorem no. 226: Wolstenholme's Theorem
1. The converse of Wolstenholme's Theorem, that $${2n-1\choose n-1}\not\equiv 1 \hspace{-0.05in}\mod n^3$$ for all composite values of $$n$$, is a famous open question. It is known to be true for even $$n$$ and for all $$n<10^9$$. See for example, Vilmar Trevisan and Kenneth Weber, "Testing the converse of Wolstenholme's theorem", Matemática Contemporânea, 21 (2001), 275–286; online.
2. A generalisation of Wolstenholme due to James Whitbread Lee Glaisher in 1900, says that $${kp-1\choose p-1}\equiv 1 \hspace{-0.05in}\mod p^3$$ for any prime $$p\geq 5$$ and any positive integer $$k$$. In this case the converse does not hold. Small counterexamples exist for $$p=4,9,25$$, for example (thus $$p=4,k=33$$ gives $${131\choose 3}\equiv 1 \hspace{-0.05in}\mod 64$$).
3. A proof of Babbage's $$p^2$$ prototype of the theorem is given here.
4. More on the 'harmonic numbers' context for Wolstensholme can be found in Zhi-Wei Sun, "Arithmetic theory of harmonic numbers", Proc. Amer. Math. Soc., 140, no. 2, 2012, 415–428, online.

Theorem no. 227: Cauchy's Theorem in Group Theory
1. A thorough analysis of the origin of Cachy's 1845 'Mémoire sur les arrangements ...', in which his theorem is asserted, has been given by Peter M. Neumann, 'On the date of Cauchy's contributions to the founding of the theory of groups', Bulletin of the Australian Mathematical Society, vol. 40, 1989, 293–302; online.
2. Incidentally to the choice of $$D_{10}$$ to illustrate this theorem, is a corollary to Cauchy's theorem that any group of order twice an odd prime is either cyclic or dihedral. This is Prop. 3.34 in our recommended book, Smith and Tabachnikova's Topics in Group Theory, Springer, London, 2000.
3. Michael Meo claims, in his article "The mathematical life of Cauchy's Group Theorem", Historia Mathematica, vol. 31, issue 2, pp. 196–221, that Cauchy's proof of his theorem contains an 'egregious error' and that a subsequent attempt by Dedekind in the 1850 is also incomplete. This would suggest that the first complete proof of the theorem comes as a corollary to its own generalisation (Sylow's theorems of 1872). However it seems fairer to assume that Cauchy was at least completely in command of his material. Meo's article is available here via Elsevier's open access archive.

Theorem no. 228: Fisher's Inequality
1. Bose's paper containing his short proof of the inequality can be read here.

Theorem no. 229: Poncelet's Porism
1. An elegant and (relatively) simple proof is given by Lorenz Halbeisen and Norbert Hungerbühler in "A Simple Proof of Poncelet’s Theorem (on the occasion of its bicentennial)", American Mathematical Monthly, in press, (preprint). For a modern proof, from algebraic geometry, see this by David Speyer.
2. A bicentennial survey of past and current research into Poncelet's theorem is given by Vladimir Dragović and Milena Radnović in "Bicentennial of the Great Poncelet Theorem (1813–2013): Current advances", Bull. Amer. Math. Soc., Vol. 51, No. 3, 2014, 373–445.
3. The example given of a quadrilateral inscribed in and circumscribing two eillipses is a cheat! The parameters were chosen by trial and error to give an adequate illustration: outer ellipse is centered on the origin and inclined at $$\tau/8$$ to $$x$$ axis; major radius $$a = 9.3$$, minor radius $$b = 4.1$$; inner ellipse is centered at $$(1.05046,1.3)$$ and has no inclination; major radius $$c = 4.0448$$, minor radius $$d = 3.22$$.
4. Some very good notes by Tony Forbes are available here (about 1.5MB), including detailed instructions for creating genuine examples for all combinations of conics (not just ellipses) and also giving a brief description of the link to elliptic curves.
5. Poncelet's Porism is also known as his Closure Theorem for reasons made beautifully clear in Jonathan King, "Three Problems in Search of a Measure", The American Mathematical Monthly, Vol. 101 (1994), pp. 609–628, online here. We find in the same article that the theorem is intimately related to Gelfand's Question!
6. There is a French version of this theorem description. If you read French the weblink from the French version is a very fine popular account of Poncelet's porism.

Theorem no. 230: Ore's Theorem in Graph Theory
1. Bondy's short proof appears in "Short proofs of classical theorems", J. Graph Theory, Vol. 44, No. 3, 2003, 159–165. The algorithmic interpretation given here is similar in spirit to an adaptation of Ore's original proof by E.M. Palmer, "The hidden algorithm of Ore's theorem on Hamiltonian cycles", Computers & Mathematics with Applications, Vol. 34, No. 11, 1997, 113–119
2. Apart from the left-most, the graphs illustrating this theorem were generated in Maple. However I manually replaced the vertices in order to get the permuted numberings (I was too lazy to work out how to get Maple to do this).

Theorem no. 231: Sophie Germain's Identity
1. The correspondence of Sophie Germain is online at the Bibliothèque nationale de France via gallica. The letter reproduced here is located by searching for '9118' under 'Manuscripts'.
2. Leonard Dickson's History of the Theory of Numbers, Volume I: Divisibility and Primality can be read online here courtesy of archive.org. The references to Euler and Germain are on pages 381 and 382, respectively.
3. The letter from Euler to Goldbach cited by Dickson can be read at the Euler Archive (August 28 in the 1742 correspondence). Not every letter from Euler to Goldbach of that year is online but it seems clear that this is the one which Dickson intends.

Theorem no. 232: The Riemann Explicit Formula
1. An excellent account of the explicit formula, starting from scratch, is this at medium.com by Jørgen Veisdal.
2. The relationship between the distribution of primes and (logarithmic) spirals has a rich history. A good example is given by Matthew Watkins here; the idea of spotting patterns in prime spirals goes back to (at least) Ulam in 1963. A nice variant by Edmund Harriss can be found here, and there is a very elegant 3D conical spiral by Dan Bach here.
3. The weblink for this theorem by Matthew Watkins offers a very clear account of Riemann's formula in the Chebyshev $$\psi$$ function version (as preferred in the Wikipedia entry for example). He has much more on the Riemann Hypothesis here; and indeed, the whole prime distribution story is the subject of his trilogy of books (with illustrator Mark Tweed) Secrets of Creation.
4. There is a famous Bonn University inagural lecture by Don Zagier on the subject of Riemann's prime counting function which can be found in English translation here (pdf, 2.5MB).
5. Much intriquing recent commentary on the Riemann Hypothesis, including an extended essay by Alaine Connes, can be found starting at this post from Not Even Wrong.
6. There is a manuscript draft of Riemann's 1859 paper on the website of the Clay Mathematics Institute.

Theorem no. 233: The Circle Area Theorem
1. The Archimedes proof of this theorem still qualifies as a textbook one, e.g. here, although calculus variants of the $$\int_0^{r\tau}\frac12r\mbox{dt}$$ variety are presumably more respectable from a modern perspective.

Theorem no. 234: A Generalised Hlawka Inequality
1. The original source for Hlawka's Inequality is Hans Hornich, "Eine Ungleichung für Vektorlängen", Mathematische Zeitschrift, Volume 48, Issue 1, (1942/43), 268–274. It may be viewed online here thanks to the Göttinger Digitalisierungszentrums. (Hornich says merely "For the special case m = 1, n = 2, Herr Hlawka has given me a purely algebraic proof..." so that the name Hornich–Hlawka as preferred by de.wikipedia.org seems more appropriate. However 'Hlawka' seems to be the generally adopted nomenclature.)
2. The original result of Dragomir Djoković appeared in "Generalizations of Hlawka's inequality", Glasnik Matematicko-Fiziki i Astronomski, Ser. II, vol. 18, (1963), issue 3, 169–175. D.M. Smiley & M.F. Smiley's paper is "The polygonal inequalities", Amer. Math. Monthly, Vol. 71, No. 7 (1964), 755–760. In both papers something more general is proved for the sequence of $$n$$ vectors: that for $$2\leq k<n$$ we have $$d_k\leq {n-2\choose k-2}d_n+{n-2\choose k-1}d_1,$$ using the notation in the statement of the theorem. The inequality as stated is found by summing over $$k$$. Djoković and Smiley & Smiley also gave conditions for equality.
3. A nice derivation of Hlawka's Inequality from the Ptolomeic Inequality is given by Alice Simon and Peter Volkmann in Annales Mathematicae Silesianae, Vol. 9, 1995, 137-140. The article is online here.

Theorem no. 235: A Theorem on Modular Fibonacci Periodicity
1. The period lengths of the modulo-reduced Fibonacci sequences continue to be the subject of intensive research. A good recent (2012) example is here. They also go by the name of Pisano periods. (after Leonardo Pisano aka Fibonacci). They are sequence A001175 at oeis.org.
2. Of particular interest is the so-called 'Wall's Question': for a prime $$p$$, is it possible that the period mod $$p$$ and mod $$p^2$$ should be equal? The question has links to the Fermat's Last Theorem via Germain's Theorem. See, Klaška, J., "Criteria for testing Wall's question", Czechoslovak Mathematical Journal, vol. 58 (2008), issue 4, pp. 1241-1246, online.
3. D.D. Wall's paper is "Fibonacci series modulo m", The American Mathematical Monthly, Vol. 67, No. 6, 1960, 525–532. It does not appear to be free-access online. Covering rather the same material is a roughly contemporary paper, "The Fibonacci matrix modulo m" by the Caltech physicist David W. Robinson. This was published in the 2nd ever issue of Fibonacci Quarterly and this is free online here.
4. The papers of Morgan Ward on linear recurrences are a good source of information on modular periodicity. They appears in Transactions of the American Mathematical Society and are free online here (1931) and here (1933). The main result from 1931 is that if $$m$$ has prime decomposition $$p_1^{a_1}p_2^{a_2}\cdots p_n^{a_r}$$ then period length mod $$m$$ is equal to the LCM of period lengths mod $$p_i^{a_i}, i=1,\ldots, r$$.

Theorem no. 236: Kemeny's Constant
1. The directed graph modelling Alice's casino is a finite automaton which finds the remainder of an input binary number (with $$H=1$$ and $$T=0$$) mod 8. Doubling appends a zero to a binary number; adding 1 thereafter appends instead a 1, so the action of the automaton is the same as step (3) in the casino game. Another example of such an automaton illustrates The Pumping Lemma. There is a nice non-binary take on this (for mod 7, but see comments) by David Wilson guesting at Tanya Khovanova's Math Blog.
2. If you operate a casino and would like to compete with Alice using Kemeny's constant, Tony Forbes has offered a neater and more intuitive version of her game (750KB pdf, see p. 20) in M500 magazine.
3. There is an interesting contrast between $$K$$, the expected time to reach the stationary distribution, and the probability of reaching the distribution in fewer than $$K$$ steps. The latter will be greater than $$1/2$$ (to compensate for the occasional long runs). So Bob will often find himself losing money to Alice but he will be seduced by the prospect of a long run, just as in any lottery you hardly ever win anything but play for the prospect of a jackpot.
4. An interesting question from Piers Myers is: can other averages for time to stationary distribution, e.g. median, also have constant values for Markov chains? For the 8-state chain used here the answer for median values appears to be, roughly, yes, according to simulations: 2500 runs from each starting state to a target state selected u.a.r. gave median times 5,4,5,4,4,4,4,5. But Piers points out that this cannot hold in general: the chain $$\left(\begin{array}{cc} 0 & 1 \\ 1/100 & 99/100\end{array}\right)$$ has stationary distribution $$(1/101, 100/101)$$; Kemeny's constant is $$100/101$$; but median time to reach stationary distribution is 1 from state 1 and 0 from state 2.
5. A very nice Markov chain animation provided by setosa.io might be of use for 'visualising' Kemeny's constant for small chains

Theorem no. 237: Sylvester's Catalecticant
1. The last section of "On the Alexander-Hirschowitz Theorem" by Maria Chiara Brambilla and Giorgio Ottaviani is very good on the history of Waring's problem for forms. Also very good are the opening pages of Power Sums, Gorenstein Algebras, and Determinantal Loci, Springer 1999, by Anthony Iarrobino and Vassil Kanev. This paper by Zach Teitler and Alexander Woo gives a good picture of the current state of play.
2. Zach Teitler provided me with much help in getting to grips with the subtleties of Sylvester's work to the point where I felt it worth quoting his comments verbatim, in the form of some additional notes.
3. A classic paper in the theory of binary forms is Joseph P. S. Kung and Gian-Carlo Rota, "The invariant theory of binary forms", Bull. Amer. Math. Soc. (N.S.), Vol. 10, No. 1, (1984), 27–85, online here.
4. Bruce Reznick draws my attention to the charming description of Sylvester on his work on binary forms:
 "I discovered and developed the whole theory of canonical binary forms for odd degrees, and, as far as yet made out, for even degrees too, at one evening sitting, with a decanter of port wine to sustain nature's flagging energies, in a back office in Lincoln's Inn Fields. The work was done, and well done, but at the usual cost of racking thought—a brain on fire, and feet feeling, or feelingless, as if plunged in an ice-pail. That night we slept no more."
(which can be found on p. xxiv of The Collected Mathematical Papers of James Joseph Sylvester: Volume 4, 1882-1897. Bruce observes, aptly I think, "If he had been known as a writer, rather than as a mathematician, this would be a famous quote!" (Of course Sylvester was proud of, although not remembered for, his poetry, see Chapter 8 of Karen Hunger Parshall's, James Joseph Sylvester: Jewish Mathematician in a Victorian World, The Johns Hopkins University Press, 2006.) By the way, an excellent slideshow by Bruce Reznick on representations of forms can be found here (1.2MB pdf).

Theorem no. 238: Euler's Even Zeta Formula
1. Euler discovered his formula in 1739 and it appeared in De seriebus quibusdam considerationes which can be read in the original Latin and in German or English translation as entry E130 at the Euler Archive. The role of the Bernoulli numbers was made explicit by Euler in his 1755 classic textbook Institutiones calculi differentialis cum eius usu in analysi finitorum ac doctrina serierum, volume 1 which is entry E212.
2. Max Woon (publishing as See Chin Woon) gave his binary tree generation of the sequence of Bernoulli numbers in "A Tree for Generating Bernoulli Numbers", Mathematics Magazine, Vol. 70, No. 1, 1997, 51–56. A generalisation to arbitrary complex sequences using elementary methods has been given by Petr Fuchs: "Bernoulli numbers and binary trees", Tatra Mountain Mathematical Publications, 20 (2000), 111–117, online (postscript) here.
3. Although there may be no direct calculations of $$\zeta(2n+1)$$ in terms of Bernoulli numbers, there are infinite series formulae. The most famous approach is probably Ramanujan's, see Section 3 of this nice overview of Ramanujan's notebooks by Bruce C. Berndt: the approach is given a thorough workout by Marc Chamberland and Patrick Lopatto in "Formulas for Odd Zeta Values and Powers of $$\pi$$", Journal of Integer Sequences, Vol. 14 (2011), Article 11.2.5, online here. The best-known formula is a special case of Ramanujan's first discovered by Mathias Lerch in 1901: if $$n$$ is odd then $$\zeta(2n+1)=\frac12\tau^{2n+1}\sum_{k=0}^{n+1}(-1)^{k+1}\frac{B_{2k}}{(2k)!}\frac{B_{2n+2-2k}}{(2n+2-2k)!}-2\sum_{k=1}^{\infty}\frac{k^{-2n-1}}{e^{k\tau}-1},$$ whereby $$\zeta(2n+1)$$, for large, odd $$n$$, is very close to a rational multiple of $$\tau^{2n+1}$$.

Theorem no. 239: Kuratowski's 14-Set Theorem
1. This theorem was apparently made famous when it featured as an exercise in John L. Kelley's (first published by Van Nostrand, 1955). It again appears as an exercise in James Munkres' Topology (chapter 2, section 17, exercise 21) where it is also required to find a set which generates the maximum of 14. The answers to Munkres' exercises are helpfully provided by Jesper Michael Møller here.
2. The particular 14-set chosen for my illustration was generated with the help of Mark Bowron's fun interactive diagram.
3. There is actually no need to state this as a theorem about topological spaces. P. C. Hammer, "Kuratowski’s closure theorem", Nieuw Archief voor Wiskunde, 7 (1960), 74–80, has shown that Kuratowski's theorem remains true for a more abstract closure operator defined set-theoretically. There is a nice discussion by Jeffrey Shallit and Ross Willard here.
4. For French speakers, this blog account of the theorem by Blogdemaths is very fine.

Theorem no. 240: The Jones Knot Polynomial Theorem
1. Strictly speaking, our presentation of this theorem uses the 'normalised bracket polynomial'. The substitution $$x=q^{1/4}$$ is used in the Jones polynomial proper (as recorded in the Knot Atlas, for example). I asked Louis Kauffman about this and he explained it as " a historical accident having to do with the fact that Jones defined the invariant in a different way (via a representation of the braid group to a Temperley—Lieb algebra) than I did by using the bracket state sum. The state sum is close to physics via ideas in statistical mechanics. The Temperley—Lieb algebra is close to physics also."
2. The definitive published resource for relationships between knot theory and physics must be Louis Kauffman's Knots and Physics, World Scientific, 4th revised edition, 2013. Kauffman's webpage is also an essential visit, with such gems as his "New Invariants in the Theory of Knots" (an Amer. Math. Monthly write-up of his 1987 breakthrough).
3. There is apparently no convention for orienting links when calculating the writhe of a multi-component link. However, if a link has more than one component then the orientations of individual components only changes the value of the Jones polynomial by a power of its variable. See, for example, Sandy Ganzell, Janet Huffman, Leslie Mavrakis, Kaitlin Tademy and Griffin Walker, "Jones polynomials of unoriented links", preprint online here.
4. One of the biggest questions in knot theory is whether the Jones polynomial distinguishes the unknot, that is, can any knot $$K$$ other than the unknot have $$J(K)=1$$? In the case of links with more than one component the Jones polynomial cannot distinguish the unlink, as shown for example by Shalom Eliahoua, Louis H. Kauffman and Morwen B. Thistlethwaite in "Infinite families of links with trivial Jones polynomial", Topology, vol. 42, no. 1, 2003, 155–169, online via Elsevier Open access. It is known, Haken's Unknot Theorem, that distinguishing the unknot is decidable, but by an algorithm that is vastly more complex than evaluating the Jones polynomial.

Theorem no. 241: The Large Prime Gaps Theorem
1. This is a 'theorem under construction': I hope to chart exciting developments here towards an eventual final version, which may or may not mean something approaching Cramér's 1936 conjecture.
2. The composite sequence in our example is $$m+k$$ where $$m=293357$$ and $$k=1,\ldots,25$$. The fact that $$Y(17)=25$$ does not mean that larger $$k$$ values will give primes: in fact $$293357+k$$ is composite until $$k=42$$. By the way, online Chinese Remainder Theorem solvers generally don't appear to accept congruences of the form $$m=-a_p\!\!\mod p$$ (this by MathCelebrity.com is an exception) but solving with positive $$a_p$$ and then negating the answer is fine, as can be seen immediately at (Theorem 5, notes(1)).
3. Work on long prime gaps has historically used the Jacobsthal function: $$j(n)$$, for positive integer $$n$$, is the smallest positive integer $$m$$, such that every sequence of $$m$$ consecutive integers contains an integer coprime to $$n$$ (alternatively, $$j(n)$$ is the maximal gap between integers coprime to $$n$$).The first thirty values A048669 are $$1, 2, 2, 2, 2, 4, 2, 2, 2, 4, 2, 4, 2, 4, 3, 2, 2, 4, 2, 4, 3, 4, 2, 4, 2, 4, 2, 4, 2, 6,\ldots$$. Ford et al's paper observes that $$Y(x)=j(P(x))-1$$ (with $$P(x)$$ the product of primes not exceeding $$x$$).
4. An excellent article about Cramer's model of the prime numbers and his conjecture is this by Andrew Granville, hosted at Chance News, who also have a whole series of lectures on probabilistic number theory, with part 2 focussing on Cramer's work.

Theorem no. 242: The Pólya–Redfield Enumeration Theorem
1. Space did not allow for the evaluation of the cycle index for the group action on the edges of the tetrahedron. The calculation gives $${b}^{6}+{b}^{5}r+2\,{b}^{4}{r}^{2}+4\,{b}^{3}{r}^{3}+2\,{b}^{2}{r}^{4}+b{r}^{5}+{r}^{6}$$, whence the determination that there are, up to rotational symmetry, four colourings with three red and three blue edges. The total number of colours is $$1+1+2+4+2+1+1=12$$, as was already established by using the Orbit Counting Lemma.
2. The description of this theorem makes a simplification by going straight from a set of labels $$L$$ to the formal power sums $$\sum x_i^k, x_i\in L,$$ substituted into the cycle index. More properly, we should associate a weight with each label and it is power sums of weights which are substituted. Thus, for example, our red-blue edge colourings of the tetrahedron each 'choose' a subset of the edges (say, the blue edges). We can think of this as having a label 'absent' and a label 'present' with weights 1 and $$x$$, respectively. And we can enumerate, say, the number of different 3-sets up to tetrahedral symmetry by substituting into the cycle index the polynomials $$1+x^k, k=1,\ldots,3$$. More formally still, we can define a 'figure-counting series' $$A(t)=\sum a_it^i$$ in which $$a_i$$ is the number of 'figures' (labels) having weight $$i$$. Then what is substituted into the cycle index are the polynomials $$A(t^k)$$. This allows $$A(t)$$ to be an infinite sum (with constant coefficients!). In Peter Cameron, Permutation Groups, Cambridge University Press, 1999, section 5.13, this approach gives the enumeration of $$n$$-sets up to symmetry via the figure counting series $$A(t)=t^0+t^1=1+t,$$ the weights being 0 and 1.
3. There is a lovely body of theory called 'orbital combinatorics' which combines enumeration up to both symmetry and structure. It originates in Peter J. Cameron, Bill Jackson and Jason D. Rudd, "Orbit-counting polynomials for graphs and codes", Discrete Mathematics, Vol. 308, Issues 5–6, 2008, 920–930, online here. There is a more up-to-date overview on Cameron's blog and see also this presentation.

Theorem no. 243: A Theorem of Anderson, Cameron and Preece on Groups of Units
1. A good presentation of this work in context by Peter Cameron, as well as much else of relevance and interest, are linked from his blog entry on the Donald Preece Memorial Day.
2. The fact that $$-3$$ is a quadratic residue mod $$p$$ for an odd prime $$p$$ if and only if $$p\equiv 1 \pmod 6$$ is a textbook exercise, c.f. Problems 9.3, no. 5(a) in David M. Burton, Elementary Number Theory 7th edition, McGraw-Hill, 2010.

Theorem no. 244: The LYM Inequality
1. There is a classic probabilitistic proof of the LYM inequality due to Peter Frankl, "A probabilistic proof for the lym-inequality", Discrete Math., Vol. 43, Issues 2–3, 1983, p. 325; online. A slightly more accessible account is here "Notes and Exercises 4".

Theorem no. 245: The Alternating Series Test
1. A good source of information on the origins of the concept of series convergence is Giovanni Ferraro, "Convergence and formal manipulation in the theory of series from 1730 to 1815", Historia Mathematica, Vol. 34, Issue 1, 2007, 62–88, online here.
2. There are some good discussions about alternating series at mathstackexchange.com: for example this, and this.
3. A wonderful compilation of proofs of divergence of the harmonic series is: Steven J. Kifowit and Terra A. Stamps, "The Harmonic Series Diverges Again and Again", The AMATYC Review, Vol. 27, No.2, Spring 2006. A preprint can be found here. The first proof of divergence is attributed to Nicolas Oresme in the 14th century.
4. A well-known occurrence of the harmonic series is in creating a large overhang when stacking overlapping blocks: the stack remains stable when the overlaps are, successively, $$1/2,1/4,1/6,\ldots$$, giving a total overhang of $$\frac12 H_n$$, $$H_n$$ being the $$n$$-th partial sum of the harmonic series. Thus an unbounded overhang is possible. But in fact much more can be achieved, as demonstrated by Paterson and Zwick in 2007. See this prepint, for example.
5. Mathwithbaddrawings has a lovely entry about how slowly the harmonic series diverges and the curious fact that, omitting terms containing the digit 9 restores convergence (the Kempner series).

Theorem no. 246: Euler's Product Formula for ζ(s)
1. The convergence of Euler's formula for real $$s>1$$ is sometimes attributed to Kronecker in the 1870s. However, it seems likely that convergence issues would have been resolved before that, even by the time of Riemann's investigations in the 1850s. Some good background is given here.
2. Euler's formula is the wellspring from which emerged group representations and harmonic analysis in the masterful account by Anthony Knapp in the April 1996 issue of the AMS Notices.

Theorem no. 247: Euler's Product Formula for Sine
1. Euler's cotangent series can also be derived from a 'half-angle' formula: $$f(x)=\frac12\left(f(x/2)+f\left((x\pm 1)/2\right)\right).$$ See, e.g., Konrad Knopp (transl. Young), Theory And Application Of Infinite Series, Dover edition, 1990. The book is online here via archive.org and the relevant text can be found on page 205ff. My presentation (with thanks to Andy Rich) is essentially the same and needs to be acknowledged as what is referred to by Knopp as an "in general faulty mode of passage to the limit" (his emphasis), which he spends two pages making rigorous (even confined to the reals). The derivation of Euler's cotangent formula from the half-angle formula is attributed by Knopp to Heinrich Schröter, 1868. There is more on its history in Chapter 11 (§ 2) of Reinhold Remmert's, Theory of Complex Functions, Springer, 1991, (transl. Robert Burckel), which is online here via archive.org.
2. A very interesting account of the Mittag-Leffler theorem, which is the 'ultimate' generalisation of Euler's formula, is available in this dissertation by Laura E. Turner.
3. An explanation by Jim Belk of why convergence of infinite products is subsumed within convergence of infinite series.
4. Euler's formula's LHS may be replaced with the more elegant (from this website's perspective!) $$\sin\tau x$$ but at the expense of elegance in the RHS, where the factors in the product become $$1-4x^2/k^2.$$

Theorem no. 248: A Theorem about Gaussian Moats
1. This is a 'theorem under construction': I hope to chart exciting developments here towards an eventual final version, which may or may not mean a proof that Gaussian moats can have unbounded width.
2. The widest Gaussian moats found thus far have width $$D=6$$, found in 2004 by Nobuyuki Tsuchimura: "Computational Results for Gaussian Moat Problem", IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Volume E88-A Issue 5, May 2005, 1267–1273; online (paywalled); preprint (400KB pdf).
3. Ellen Gethner has kindly supplied information on the origins of Gordon's Gaussian Moat problem:
(the "paper by Jordan and Rabung" is J. H. Jordan and J. R. Rabung, "A conjecture of Paul Erdős concerning Gaussian primes", Math. Comp., vol. 24 (1970) 221–223. They construct a 4- moat, i.e. they show that steps of size at least $$D=4$$ are required to reach infinity.)
4. I found these notes (pdf) on Gaussian integers by Christian Wuthrich of great value, as also these by Noah Snyder.

Theorem no. 249: Bézout's Identity
1. It should be noted that the discrete logarithm problem is solved by quantum computers: Peter W. Shor, "Algorithms for quantum computation: discrete logarithms and factoring", in Proc. 35nd Annual Symposium on Foundations of Computer Science (Shafi Goldwasser, ed.), IEEE Computer Society Press, 1994, 124–134 (last time I looked there was an online copy here (see '1994'), Shor's arxiv version is also available but this is expanded from the FOCS publication and is a bit less accessible, I think.)

Theorem no. 250: the Power of a Point Theorem
1. Michael N. Fried, the author of "Mathematics as the science of patterns", the recommended weblink from this theorem, gave me the following insight, which I find charming and valuable:
 $$h$$, as you defined it, can be thought of in a slightly different way. Consider the function $$f(x,y)=|(x-a)^2+(y-b)^2-r^2|$$. Its zeros are the points on a circle with center $$(a,b)$$ and radius $$r$$; but its value at an arbitrary point $$P(x,y)$$ is the power of $$P$$ with respect to that circle. Students typically learn to solve equations $$f(x,y)=0$$, that is, to find the curve they represent, but then ignore the values of $$f$$ at other points. For example, the curve given by $$f(x,y)=|ax+by+c|=0$$ (normalized so that $$a^2+b^2=1$$) is of course a straight line $$L$$, while the value of $$f$$ at other points is the distance between those points and $$L$$ (including of course those points whose distance from $$L$$ is zero!).
Michael Fried has, by the way, a fascinating presentation comparing the relevant bits of Euclid with Steiner's geometry.
2. Cut-The-Knot offers a very nice application of the Intersecting Chords Theorem and there is this memorable take on the theorem by Solve My Maths.

Theorem no. 251: The Hanani–Tutte Theorem
1. Hanani's original 1934 paper, published under his Polish birth name of Chaim Chojnacki as "Über wesentlich unplättbare Kurven im dreidimensionalen Raume", Fundamenta Mathematicae, Vol. 23, Issue 1, 1934, pages 135–142, can be read online here. Bill Tutte's 1970 paper "Toward a theory of crossing numbers", Journal of Combinatorial Theory, Vol. 8, Issue 1, 1970, pages 45–53, can be read online here.
2. The algebraic specification of planarity is independently credited to Wen-Jun Wu, "On the planar imbedding of linear graphs", Journal of Systems Science and Mathematical Sciences, 1985 Issue 4, pages 290–302, with independent work of some others preceding it. More details at the Wikipedia entry for the theorem.
3. Although we can solve equations to turn a graph drawing into one which has only evenly crossing independent edge pairs it is not obvious how to turn the result into a drawing with no crossings at all. For this we need a direct algorithmic proof of Hanani–Tutte and this was first provided by Michael J. Pelsmajer, Marcus Schaefer and Daniel Štefankovic in "Removing even crossings", Journal of Combinatorial Theory B, Vol. 97, Issue 4, 2007, pages 489–500, online here.
4. For small graphs, testing solvability of the Tutte–Wu equation system allows planarity testing without recourse to graph algorithms or data structures. However, for large graphs this does not compete realistically with known linear algorithms since the worst-case running time is $$O(n^6)$$ where $$n$$ is the number of vertices of the graph (strictly, the number of edges is involved in the running time but a non-linear number of edges guarantees non-planarity by the $$3n-6$$ bound, see Kuratowski's Theorem).

Theorem no. 252: Bertrand's Ballot Theorem
1. As so often, the theorem is not accurately named, since it was apparently first stated by William Allen Whitworth in 1878. The general form I have given is due to neither, having evolved from the original case $$k=1$$. Details may be found in Marc Renault, "Four Proofs of the Ballot Theorem", Mathematics Magazine, vol. 80, no. 5, 2007, pp 345–352; online via Renault's Ballot Problem page (which is the recommended weblink from the theorem page).
2. Just to fill in the details, there is a claim on the theorem page that removing a sequence of $$k$$ $$a$$'s and a $$b$$ from a cycle with $$n$$ $$b$$'s and $$m=n(k-1)+S$$ $$a$$'s, "gives a new cycle in which the surplus $$S$$ is reduced by exactly 1." Indeed, with $$M=m-k$$ $$a$$'s and $$N=n-1$$ $$b$$'s remaining, we have $$M=n(k-1)-k+S=N(k-1)+k-1-k+S=N(k-1)+(S-1).$$ (The proofs of the Cycle Lemma I have seen invoke the pigeon hole principle to repeatedly remove the $$a$$-$$b$$ sequences. But I find this obscure — what are the boxes? Saying 'a counting argument shows...' would seem more appropriate. And I have gone so far as to actually spell out the counting argument.)

Theorem no. 253: The Third Isomorphism Theorem
1. This page has been separated off from an ealier combined description of the 2nd and 3rd isomorphism theorems, see Theorem 35, notes(1).
2. There is a temptation, when dealing with quotient groups to use shorthand group notation as in $$F_{20}/C_5\cong C_4$$. It has to be kept in mind, however, that quotienting by isomorphic subgroups need not result in isomorphic quotient groups. See this by mathcounterexamples.net, for example.

Theorem no. 254: Kasteleyn's Theorem
1. This theorem is sometimes stated with the argument of the square root being the absolute value of the determinant. This protects against an eventuality, of the determinant being negative, which in fact cannot arise, since the matrix in question is necessarily real, skew symmetric and thus has non-negative determinant.
2. Donald Knuth gives some valuable history of the Pfaffian function in "Overlapping Pfaffians", Electronic Journal of Combinatorics, vol. 3, no. 2, 1996; online.
3. David E. Speyer has given short topological proofs of Kasteleyn's theorem, and variants of it, in "Variations on a theme of Kasteleyn, with application to the totally nonnegative Grassmannian", Electronic Journal of Combinatorics, vol. 23, no. 2, 2016; online.

Theorem no. 255: Countability of the Rationals
1. I am not sure if Cantor ever explicitly published the countability of the rationals as a theorem. It is mentioned in the Wiki article on Cantor's first (1874) set theory paper as arising in correspondence between Dedekind and Cantor. It follows at once from more general results such as the countability of a countable union of countable sets (but this latter result requires the axiom of choice!)

Theorem no. 256: Perfect's Necklace Formula
1. Hazel Perfect's formula was published as a note in The Mathematical Gazette: "2588. Concerning Arrangements in a Circle", Vol. 40, No. 331 (Feb., 1956), pp. 45–46. Not free-to-view online unfortunately but a 1-page preview gives you everything you need!
2. The best-known algorithm for generating necklaces for a given number of beads and colours is by Harold Fredericksen, Irving J. Kessler and James Maiorana and is known as the FKM algorithm (you can see it in action on Jason Davies' necklaces page). An interesting alternative and a good source of information is Frank Ruskey, Terry MinYih Wang and Carla D.Savage, "Generating Necklaces", Journal of Algorithms, vol. 13, no. 3, 1992, 414–43; preprint.

Theorem no. 257: Distribution of Local Maxima in Random Samples
1. The original pulication is T. Austin, R. Fagen, T. Lehrer, and W. Penney, "The Distribution of the Number of Locally Maximal Elements in a Random Sample", Annals of Mathematical Statistics, Vol. 28, Number 3 (1957), 786-790; online.
2. T. Lehrer is apparently the Tom Lehrer famous as a satirical singer-songwriter. The website thetomlehrer.weebly.com mentions the above and another article with the somewhat mysterious commentary "Unfortunately these mathematical publications did not have a lasting effect on society."
3. At any rate, the Austin et al article provoked a reaction in the profession: M.O. Glasgow, "Note on the Factorial Moments of the Distribution of Locally Maximal Elements in a Random Sample", Ann. Math. Statist., Vol. 30, Number 2 (1959), 586–590; online.
4. There is a nice tribute to Lehrer at Gödel's Lost Letter which links to a previous short entry on his theorem with Austin et al.

Theorem no. 258: Sylow's Theorems
1. The original pulication is L. Sylow, "Théorèmes sur les groupes de substitutions", Mathematische Annalen, Vol. 5, 1872, pp 584–594; online. An English translation is provided here.
2. Geoff Smith, whose book is the recommended reading for this theorem, gave me following nice picture of $$A_4$$ not having an order-6 subgroup:
 When discussing the non-existence of a subgroup of order 6 in $$A_4$$, you do have the option to geometrize. Colour the vertices of a cube red and blue so that no vertices joined by an edge are the same colour. The group of rotations of the cube which preserve the blue vertices is a copy of $$A_4$$ (label the blue vertices 1 to 4). The elements of this $$A_4$$ are then rotations about grand diagonals and rotations through pi using skewers centre-face to centre-face. This enables one to reason geometrically about $$A_4$$ and to "see" what is going on. This has the disadvantage that people with poor geometric intuition will melt, but the advantage of dealing with things more concrete than permutations.
3. The bell-ringing illustration for this theorem deserves a little amplification, which space on the page itself did not allow. The Plain Bob method starts with the 2-Sylow subgroup and is completed by switching to its cosets in $$\mbox{Sym}_4$$: permutations 8 to 15 are the left coset by $$(2 4 3)$$; permutations 16 to 23 are the left coset by $$(3 4)$$. Bell ringing in general provides good examples of Lagrange's theorem in action! There is a good introductory article in this vein "Bells, Motels and Permutations Groups" by Gary McGuire.