Solution: More About Benford's Law Math Puzzle
Here are a few more examples of Benford's law.
looked at the sizes of the files in the /system32/ directory of his computer.
|
First
digit |
Number of files
with that first digit |
Percentage of total |
1
|
587
|
29
|
2
|
311
|
15
|
3
|
236
|
12
|
4
|
169
|
8
|
5
|
185
|
9
|
6
|
250
|
12
|
7
|
115
|
6
|
8
|
112
|
5
|
9
|
85
|
4
|
|
The most common first digit is 1, with the others following in almost decreasing order of frequency.
Of the 235 countries listed on the GeoHive Web site, 64 have population figures that begin with 1. That’s 27%.
The finfacts Web site shows the performance of Japan’s Nikkei stock market index for the 91 years from 1914–2004 inclusive. The end-of-year value of the index began with a 1 in 30 of those years. That’s 33%.
offers this explanation of Benford’s law:
I think that the simplest explanations come in two or three parts. First, show that any measure that changes in a multiplicative manner (like stock values) obeys Benford's law. Then, show that any measure that is scale invariant (such as anything that has arbitrary units, like length in km or weight in pounds) is also a multiplicative measure and hence obeys Benford's law. And finally, show that this effect is more universal than it first appears.
This is the easiest to understand intuitively. It refers to measures that change in a multiplicative manner from the last value, such as stock values from day to day that change on a percentage basis regardless of their actual value.
Malcolm Browne said in a New York Times article (published August 4, 1998), Following Benford's Law, or Looking Out for No. 1
Most numbers we see every day are not random quantities in and of themselves. They're usually computed qualities with some aspect of multiplication to them.
Consider, for example, any property which grows on a percentage basis. Like, say, the Dow Jones Industrial Average. It typically grows a few percent a year. Suppose, just to pick a rate, that on average the DJIA grows at 7% a year. At that rate, it doubles about every ten years. Suppose that the DJIA is 10000. After ten years of having 1 as the leading digit, it finally gets to 20000. Ten years go by again, but in that ten years, it doubles to 40000, not 30000. Therefore, those ten years were spent about half starting with 2, and about half starting with 3. Ten more years go by, and it doubles again to 80000. Now ten years have 4, 5, 6 and 7 as the leading digits in only ten years. Eventually we get up to 100000, and spend another ten years starting with 1. Pick a random date and you'd expect that the DJIA on that day would be twice as likely to start with 1 as 2, and four times as likely to start with 1 as 5.
Imagine you have a measure like length of rivers in kilometers. When you look at the first digit of all the numbers, you get a particular distribution of the numbers 1 through 9. Now convert the lengths to another unit, say, miles, by multiplying by 0.62, and you will get a different first-digit distribution. If you now say that because either unit is arbitrary, you should have roughly the same distribution of first digits in either unit system, then this scale invariance in first-digit distribution is the same as requiring invariance under multiplication. A rigorous approach is found in many places, for example MathWorld.
It is also true that even if you start with a uniform distribution in first digit, after enough multiplications or scale changes you end up with a Benford distribution. Think of it as the limiting distribution after sufficient multiplications, regardless of which distribution you started with. There are also references to this effect in MathWorld.
Another Solution
Finally, here’s a case that seems to contradict the scale invariance principle that José describes above. Large Lakes of the World lists the 35 largest lakes:
|
Name and location |
sq. mi. |
km |
Caspian Sea,
Azerbaijan-Russia- Kazakhstan-Turkmenistan-Iran
|
152,239
|
394,299
|
Superior, U.S.-Canada
|
31,820
|
82,414
|
Victoria, Tanzania-Uganda
|
26,828
|
69,485
|
Huron, U.S.-Canada
|
23,010
|
59,596
|
Michigan, U.S.
|
22,400
|
58,016
|
Aral, Kazakhstan-Uzbekistan
|
13,000
|
33,800
|
Tanganyika, Tanzania-Congo
|
12,700
|
32,893
|
Baikal, Russia
|
12,162
|
31,500
|
Great Bear, Canada
|
12,000
|
31,080
|
Nyasa, Malawi-Mozambique-Tanzania
|
11,600
|
30,044
|
Great Slave, Canada
|
11,170
|
28,930
|
Chad, Chad-Niger-Nigeria
|
9,946
|
25,760
|
Erie, U.S.-Canada
|
9,930
|
25,719
|
Winnipeg, Canada
|
9,094
|
23,553
|
Ontario, U.S.-Canada
|
7,520
|
19,477
|
Balkhash, Kazakhstan
|
7,115
|
18,428
|
Ladoga, Russia
|
7,000
|
18,130
|
Onega, Russia
|
3,819
|
9,891
|
Titicaca, Bolivia-Peru
|
3,141
|
8,135
|
Nicaragua, Nicaragua
|
3,089
|
8,001
|
Athabaska, Canada
|
3,058
|
7,920
|
Rudolf, Kenya
|
2,473
|
6,405
|
Reindeer, Canada
|
2,444
|
6,330
|
Eyre, South Australia
|
2,400
|
6,216
|
Issyk-Kul, Kyrgyzstan
|
2,394
|
6,200
|
Urmia, Iran
|
2,317
|
6,001
|
Torrens, South Australia
|
2,200
|
5,698
|
Vänern, Sweden
|
2,141
|
5,545
|
Winnipegosis, Canada
|
2,086
|
5,403
|
Mobutu Sese Seko, Uganda
|
2,046
|
5,299
|
Nettilling, Baffin Island, Canada
|
1,950
|
5,051
|
Nipigon, Canada
|
1,870
|
4,843
|
Manitoba, Canada
|
1,817
|
4,706
|
Great Salt, U.S.
|
1,800
|
4,662
|
Kioga, Uganda
|
1,700
|
4,403
|
|
If the areas of the lakes are measured in square miles, 12 of them, or 34% , then begin with the digit one. But if they are measured in square kilometers, only 3 of them, less that 9%, begin with one. Why? A footnote to the table gives us an answer: Only lakes with an area greater than 1,700 sq mi (4,400 sq km) are included. This means that the five smallest lakes, which all begin with 1 when measured in square miles, move into the 4,000s when measured in square kilometers.
If the cutoff point had been 1,700 sq km (1,056 sq mi), the result might be very different. Maybe you can track down that data.