Results 1 to 2 of 2

Thread: Zipf's Law and statistics

  1. Top | #1
    Veteran Member excreationist's Avatar
    Join Date
    Aug 2000
    Location
    Australia
    Posts
    1,021
    Archived
    4,886
    Total Posts
    5,907
    Rep Power
    76

    Zipf's Law and statistics

    ....a bizarre pattern emerges. The second most used word will appear about half as often as the most used. The third one third as often. The fourth one fourth as often. The fifth one fifth as often. The sixth one sixth as often, and so on all the way down...


    I found the video to be interesting... and mysterious...

  2. Top | #2
    Administrator lpetrich's Avatar
    Join Date
    Jul 2000
    Location
    Eugene, OR
    Posts
    14,249
    Archived
    16,829
    Total Posts
    31,078
    Rep Power
    93
    The mathematics of Zipf's law is interesting. It means that something with position n in order of size will have probability
     p(n) = \frac{p_0}{n}

    where p0 is a normalization constant. Adding up over all n up to some maximum value N gives us
     1 = \sum_{n=1}^N p(n) = p_0 \sum_{n=1}^N \frac{1}{n} = p_0 (\log N + \gamma + O(1/N))

    in the limit of large N, where the logarithm is the natural one, relative to e = 2.7182818... and (gamma) is the Euler–Mascheroni constant = 0.57721...

    The Wikipedia article has a graph of word counts for several sub-Wikipedias, and there is a slight bend downward at a rank of about 10,000, going from a negative power around 1 to a slightly higher absolute value.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •