# Zipf's Law and statistics

• 09-11-2020, 11:39 AM
excreationist
Zipf's Law and statistics
Quote:

....a bizarre pattern emerges. The second most used word will appear about half as often as the most used. The third one third as often. The fourth one fourth as often. The fifth one fifth as often. The sixth one sixth as often, and so on all the way down...

I found the video to be interesting... and mysterious...
• 12-21-2020, 06:51 PM
lpetrich
The mathematics of is interesting. It means that something with position n in order of size will have probability
$p(n) = \frac{p_0}{n}$

where p0 is a normalization constant. Adding up over all n up to some maximum value N gives us
$1 = \sum_{n=1}^N p(n) = p_0 \sum_{n=1}^N \frac{1}{n} = p_0 (\log N + \gamma + O(1/N))$

in the limit of large N, where the logarithm is the natural one, relative to e = 2.7182818... and (gamma) is the = 0.57721...

The Wikipedia article has a graph of word counts for several sub-Wikipedias, and there is a slight bend downward at a rank of about 10,000, going from a negative power around 1 to a slightly higher absolute value.
• 04-05-2021, 05:23 PM
steve_bank
Word and letter probability was part of early code breaking techniques.

In a coded document or communication match symbols for words and letter based on probability of words.