Results 1 to 3 of 3

Thread: Zipf's Law and statistics

  1. Top | #1
    Veteran Member excreationist's Avatar
    Join Date
    Aug 2000
    Location
    Australia
    Posts
    1,113
    Archived
    4,886
    Total Posts
    5,999
    Rep Power
    77

    Zipf's Law and statistics

    ....a bizarre pattern emerges. The second most used word will appear about half as often as the most used. The third one third as often. The fourth one fourth as often. The fifth one fifth as often. The sixth one sixth as often, and so on all the way down...


    I found the video to be interesting... and mysterious...

  2. Top | #2
    Administrator lpetrich's Avatar
    Join Date
    Jul 2000
    Location
    Eugene, OR
    Posts
    14,598
    Archived
    16,829
    Total Posts
    31,427
    Rep Power
    93
    The mathematics of Zipf's law is interesting. It means that something with position n in order of size will have probability
     p(n) = \frac{p_0}{n}

    where p0 is a normalization constant. Adding up over all n up to some maximum value N gives us
     1 = \sum_{n=1}^N p(n) = p_0 \sum_{n=1}^N \frac{1}{n} = p_0 (\log N + \gamma + O(1/N))

    in the limit of large N, where the logarithm is the natural one, relative to e = 2.7182818... and (gamma) is the Euler–Mascheroni constant = 0.57721...

    The Wikipedia article has a graph of word counts for several sub-Wikipedias, and there is a slight bend downward at a rank of about 10,000, going from a negative power around 1 to a slightly higher absolute value.

  3. Top | #3
    Contributor
    Join Date
    Nov 2017
    Location
    seattle
    Posts
    6,446
    Rep Power
    20
    Word and letter probability was part of early code breaking techniques.

    In a coded document or communication match symbols for words and letter based on probability of words.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •