Page 1 of 3 123 LastLast
Results 1 to 10 of 28

Thread: Can someone check my Stats?

  1. Top | #1
    Member Grendel's Avatar
    Join Date
    Feb 2018
    Location
    Bunya Mountains
    Posts
    110
    Rep Power
    9

    Can someone check my Stats?

    I've forgotten all my stat-math.

    Here's my problem:

    In a population of 20 million people I know that 53 people have the surname 'X'. (I need to know random values, ignoring family and tribal connections)

    If I reach into the people-bucket and pluck out one person, then I calculate the chances of that person being X are (20,000,000 / 53) = 370,000:1 against. To be statistically confident of plucking out X I would have to scoop out 370,000 people at once.

    But if I can only scoop out 20,000 per dip, then what are the odds of 'X' being in that 20K scoop? I work this out at

    A: (370,000 / 20,000) = 18.5

    Now what are the chances of a random 20K scoop containing two 'X's?

    B: (18.5 x 18.5) = 340:1 or 6.8 million people.

    Now what are the chances of them both having the same initial?

    C: (340 x 26 x 26) = 230,000:1 or 4.6 billion people.

    I would have to scoop the bucket 230,000 times (with 20K scoops) or I would need a population of 4.6 billion people to be confident that two 'X's have the same initial?


    Is this correct?



    Cheers Greg

  2. Top | #2
    New Member
    Join Date
    Dec 2017
    Location
    Land of Smiles
    Posts
    42
    Rep Power
    10
    As a general principle, remember to distinguish "A to 1 against" odds from "B for 1 against." B = A+1

    Another useful idea is to be precise: 20000000/53 is over 377,000, not 370,000. Approximations are fine for many purposes, but why impose unnecessary burdens when seeking help?

    Quote Originally Posted by Grendel View Post
    If I reach into the people-bucket and pluck out one person, then I calculate the chances of that person being X are (20,000,000 / 53) = 370,000:1 against.

    [X]: To be statistically confident of plucking out X I would have to scoop out 370,000 people at once.

    But if I can only scoop out 20,000 per dip, then what are the odds of 'X' being in that 20K scoop? I work this out at

    A: (370,000 / 20,000) = 18.5

    Now what are the chances of a random 20K scoop containing two 'X's?

    B: (18.5 x 18.5) = 340:1 or 6.8 million people.

    [C]: Now what are the chances of them both having the same initial?...
    X: Don't be too confident! You'll fail to find X in that sample 36.8% of the time. (That's the reciprocal of Euler's Number.)
    Intuition: The expected (average) number of X you'll find is 1.0 exactly. But sometimes you'll find more than 1; therefore sometimes you'll find less.

    [A] Your calculation yields 18.87 for 1 (or 17.87 to 1), not 18.5 to 1. But the calculation would be incorrect anyway, for reasons related to (x).
    A better approximation(*) is that success occurs with probability 1 - exp(20000* ln(1 - 53/20000000)) = 5.16%
    By coincidence this is expressed as 18.4 to 1 — almost the figure you give. Sometimes two wrongs do make a right!
    (* - this formula works for sampling with replacement. It's good enough here because X's density, starting at 53 units, gets up only to 53.053 units after 19,999 non-X withdrawals.)

    [B] Repeating this calculation your way, but with the corrected numbers just shown, produces 1 chance in 375.3, written 374.3 : 1.
    But your approach is incorrect; more accurate odds would be 737 : 1.
    Why? Your multiplication assumes that the chance an unknown group will have an X is identical to the chance that a group known to have at least one X (though which element is X is unknown) will have a second X. But in fact these X-nesses are not independent.
    I don't think it's a coincidence that the correct odds are almost exactly half what your incorrect calculation gives. (The correct odds of THREE or more X's would be 1/6 of what your calculation produces. 6 = 3! I'll leave a demonstration of this as an exercise! ... An exercise for myself; I've forgotten much of what I once knew. )

    [C] Do you think the 26 letters are equally likely as middle inititals? I don't ... and suggest you first refine your approaches for (A) and (B) anyway.

  3. Top | #3
    Veteran Member
    Join Date
    Sep 2006
    Location
    Connecticut
    Posts
    2,139
    Archived
    1,715
    Total Posts
    3,854
    Rep Power
    54
    Quote Originally Posted by Swammerdami View Post
    [A] Your calculation yields 18.87 for 1 (or 17.87 to 1), not 18.5 to 1. But the calculation would be incorrect anyway, for reasons related to (x).
    A better approximation(*) is that success occurs with probability 1 - exp(20000* ln(1 - 53/20000000)) = 5.16%
    By coincidence this is expressed as 18.4 to 1 — almost the figure you give. Sometimes two wrongs do make a right!
    (* - this formula works for sampling with replacement. It's good enough here because X's density, starting at 53 units, gets up only to 53.053 units after 19,999 non-X withdrawals.)
    The calculation for sampling without replacement is not too difficult,and it's good to see how much that changes the answer. I get 1 - C(19999947,20000)/C(20000000,20000) = 0.0516452 for the probability without replacement and 1 - (19999947/20000000)20000 = 0.0516201 for the probability with replacement. The difference between odds of 18.36:1 and 18.37:1.

    [B] Repeating this calculation your way, but with the corrected numbers just shown, produces 1 chance in 375.3, written 374.3 : 1.
    But your approach is incorrect; more accurate odds would be 737 : 1.

    Why? Your multiplication assumes that the chance an unknown group will have an X is identical to the chance that a group known to have at least one X (though which element is X is unknown) will have a second X. But in fact these X-nesses are not independent.
    For the sample without replacement, the probability is 1 - (C(19999947,20000) + C(53,1)C(19999947,19999))/C(20000000,20000) = 0.00133195, giving odds of around 750:1.

    I don't think it's a coincidence that the correct odds are almost exactly half what your incorrect calculation gives. (The correct odds of THREE or more X's would be 1/6 of what your calculation produces. 6 = 3! I'll leave a demonstration of this as an exercise! ... An exercise for myself; I've forgotten much of what I once knew. )
    I think it might actually just be a coincidence. Try doing the calculations with different numbers.

  4. Top | #4
    New Member
    Join Date
    Dec 2017
    Location
    Land of Smiles
    Posts
    42
    Rep Power
    10
    Quote Originally Posted by beero1000 View Post

    The calculation for sampling without replacement is not too difficult ...
    I don't think it's a coincidence that the correct odds are almost exactly half what your incorrect calculation gives. (The correct odds of THREE or more X's would be 1/6 of what your calculation produces. 6 = 3! I'll leave a demonstration of this as an exercise! ... An exercise for myself; I've forgotten much of what I once knew. )
    I think it might actually just be a coincidence. Try doing the calculations with different numbers.
    The formula isn't difficult. I just didn't know a fast way to calculate large c(,) or factorial on my machine and was too lazy/groggy to try to apply Stirling's approximation.
    I see that Wolfram Alpha will calculate those large c(,), though not very quickly. How do you do it?

    Yes, I could have divided all the large numbers by 10 — then my machine would handle them, though still slowly. But as I said, and you agreed, the with-replacement approximation was good enough here.

    As for p2 ~= .5 * (1-p0)^2 when 20,000,000 >> 20,000 >> 53, where ">>" denotes "MUCH greater than" and pk is probability of exactly k hits, I did, just now, succeed in proving this ... though with great effort(*). Effort so great that I won't attempt to prove the conjecture pk ~= (1/k!) * (1-p0)^k

    I suspect that if/when my brain unfogs I'll stumble on a familiar asymptotic formula with these approximations readily derived.

    (* - Not "effort" in the sense of a mathematical challenge. Just effort in doing routine but tedious algebraic manipulations. I used to be better than this ... really! :-) )

  5. Top | #5
    New Member
    Join Date
    Dec 2017
    Location
    Land of Smiles
    Posts
    42
    Rep Power
    10
    Quote Originally Posted by Swammerdami View Post
    ... when 20,000,000 >> 20,000 >> 53 >> 1, where ">>" denotes "MUCH greater than" and pk is probability of exactly k hits, ... I won't attempt to prove the conjecture pk ~= (1/k!) * p0 * (1-p0)^k
    [Note corrections in red]
    And as soon as I stepped away from keyboard, proof became trivial!

    Let's substitute s = 20 million; w = 20 thousand. We'll leave 53 alone.
    Given s >> w >> 53 >> 1,k we seek to prove that
    pk = C(53,k) C(s-53,w-k)) / C(s,w)
    is approximated with
    pk ~= p0 (1 - p0) ^ k / k!
    Change the C(.) to factorials:
    pk ~= 53! (s-53)! w! (s-w)! / (w-k)! (s-53-w+k)! s! k! (53-k)!
    Change a! / (a-b)! to the approximation a^b whenever a >> b; and rearrange a bit to get
    k! pk ~= 53^k w^k (s-w)^53 / s^53 (s-w)^k
    Solve for p0 to get p0 ~= (s-w)^53 / s^53 and recall that, when s >> w >> 53, p0 ~= 1 - 53w/s or
    1 - p0 ~= 53w/s
    Observing that (s/(s-w))^k ~= 1, substitutions now produce
    k! pk ~= p0 (1-p0)^k
    Q.E.D.

  6. Top | #6
    Member Grendel's Avatar
    Join Date
    Feb 2018
    Location
    Bunya Mountains
    Posts
    110
    Rep Power
    9
    ummmmm ....

    So, what are the odds of two people, surname X, initial A,
    occurring in a random 20k sample,
    when the surname X accounts for only 53 names in 20 million

    ?

    Greg

  7. Top | #7
    Veteran Member
    Join Date
    Dec 2010
    Location
    Riverside City
    Posts
    3,897
    Archived
    6,289
    Total Posts
    10,186
    Rep Power
    39
    Quote Originally Posted by Grendel View Post
    ummmmm ....

    So, what are the odds of two people, surname X, initial A,
    occurring in a random 20k sample,
    when the surname X accounts for only 53 names in 20 million

    ?

    Greg
    Without knowing the frequency of initial A.?

  8. Top | #8
    Member Grendel's Avatar
    Join Date
    Feb 2018
    Location
    Bunya Mountains
    Posts
    110
    Rep Power
    9
    1 in 26 letters of the alphabet?

  9. Top | #9
    Veteran Member Lion IRC's Avatar
    Join Date
    Feb 2016
    Location
    Australia
    Posts
    4,010
    Rep Power
    20
    Most popular boy/girl baby names in 1950:

    1. James / Linda
    2. Robert / Mary
    3. John / Patricia
    4. Michael / Barbara
    5. David / Susan
    6. William / Nancy
    7. Richard / Deborah
    8. Thomas / Sandra
    9. Charles / Carol
    10. Gary / Kathleen

    Here's the list from last year:

    1. Jacob / Emily
    2. Michael / Isabella
    3. Ethan / Emma
    4. Joshua / Ava
    5. Daniel / Madison
    6. Christopher / Sophia
    7. Anthony / Olivia
    8. William / Abigail
    9. Matthew / Hannah
    10. Andrew / Elizabeth


    Wait - I forgot Ahmed, Abdel, Ali, Ashraqat, Aya...

  10. Top | #10
    Member Grendel's Avatar
    Join Date
    Feb 2018
    Location
    Bunya Mountains
    Posts
    110
    Rep Power
    9
    Quote Originally Posted by Lion IRC View Post
    Most popular boy/girl baby names in 1950:


    Wait - I forgot Ahmed, Abdel, Ali, Ashraqat, Aya...
    Doesn't ass start with A?

Similar Threads

  1. Gun Violence Stats October 26 2018
    By phands in forum Political Discussions
    Replies: 80
    Last Post: 11-03-2018, 03:49 AM
  2. Here are the stats on false sexual assault claims
    By phands in forum Political Discussions
    Replies: 34
    Last Post: 10-10-2018, 05:32 PM
  3. Gun Violence Stats August 6 2018
    By phands in forum Political Discussions
    Replies: 0
    Last Post: 08-06-2018, 02:46 PM
  4. New Rape Stats Are In: Over a third of victims are male.
    By AthenaAwakened in forum Political Discussions
    Replies: 20
    Last Post: 04-30-2014, 07:22 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •