# Thread: Zipf's Law and statistics

1. ## Zipf's Law and statistics

....a bizarre pattern emerges. The second most used word will appear about half as often as the most used. The third one third as often. The fourth one fourth as often. The fifth one fifth as often. The sixth one sixth as often, and so on all the way down...

I found the video to be interesting... and mysterious...

2. The mathematics of is interesting. It means that something with position n in order of size will have probability
$p(n) = \frac{p_0}{n}$

where p0 is a normalization constant. Adding up over all n up to some maximum value N gives us
$1 = \sum_{n=1}^N p(n) = p_0 \sum_{n=1}^N \frac{1}{n} = p_0 (\log N + \gamma + O(1/N))$

in the limit of large N, where the logarithm is the natural one, relative to e = 2.7182818... and (gamma) is the = 0.57721...

The Wikipedia article has a graph of word counts for several sub-Wikipedias, and there is a slight bend downward at a rank of about 10,000, going from a negative power around 1 to a slightly higher absolute value.

3. Word and letter probability was part of early code breaking techniques.

In a coded document or communication match symbols for words and letter based on probability of words.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•