# Thread: Can someone check my Stats?

1. Originally Posted by Grendel O God, let me try again.
O God? Is there's something wrong with the answers you've gotten? The chance of reaching into the bucket and picking out a red ball are:

(20,000,000 / 52) = 384,615

Meaning that the probability of drawing out a red ball is 1 in 384,615 chances.
Just to be pedantic, a "chance" is a number in the interval [0, 1]. "1 in 384,615" is a chance. "384,615" is not.
I'm not particularly a fan of pedantry, but let's do our utmost here to strive for clarity.

Meaning that I would have to reach into the bucket 384,615 times in order to be confident of drawing out a red ball.
What does "confident" mean to you? This drawing will fail 36.79% of the time. This, like everything else you ask, has been explained in the thread already. Originally Posted by Grendel OK ... if we agree on the above, then the chances of drawing out a red ball in a scoop of 20K balls is:

(384,615 / 20,000) = 19.23
Wrong. Let's see if we can agree on WHY it's wrong.

First, humor us, and use probabilities that are between 0 and 1. IOW, write (52 * 20000) / 20000000 = 1/19.23. (Your introduction of "384,615 " is just a distraction.) BTW, 1/19.23 is an approximation to the exact answer 5.2000000%. Do you see that?

But 0.052 is NOT the probability you will draw exactly one red, nor is it the probability you will draw at least one red. It is the number of reds you will draw on average.
The probabilities you will draw exactly 0, 1, 2, 3, 4 or 5 reds are
p0 = .9493041051
p1 = .0494133528
p2 = .0012612419 ~ .05^2 / 2!
p3 = .0000210397 ~ .05^3 / 3!
p4 = .0000002580 ~ .05^4 / 4!
p5 = .0000000025 ~ .05^5 / 5!
(I've shown the probabilities along with an approximation, mentioned upthread, that may be useful for quick estimates.)

p1 is 0.049..., not 0.052. To get 0.052 you'll need the sum p1 + 2*p2 + 3*p3 + 4*p4 + ...

Does any of this help?

If we can just settle that for the moment then we can look at shades of red later. OK?

cheers ... Greg
When we look at shades of red, please clarify the ambiguity identified above in #17 ( "(a) or (b)" ).  Reply With Quote

2. First of all, thanx for taking the time with me Swammerdami.

If I had a bucket with 384,614 blue balls and 1 red ball. Then in order to be 100% confident that I will draw out the red ball, then the minimum numbers of draws required that I can predict with certainty is 384,615 draws. It may be red on the first draw or the last draw.

I am not interested in that certainty.

I'm aware that in a pop of 20,000,000, (52 red) that a draw of 384,615 will not always produce a red ball, that sometimes it will produce 2, etc. The odds, or whatever you want to call them, of 384,615:1 against drawing a red ball is all I need to know. Just the probability.

If you're referring to the breakdown of that probability occurring on any particular draw (36% of the time) but over all draws redressing/averaging itself, then thats fine. No problem, doesn't matter to me.

But if there is something you're trying to get into my thickhead beyond that, then keep trying.

Greg   Reply With Quote

3. Originally Posted by Grendel OK ... if we agree on the above, then the chances of drawing out a red ball in a scoop of 20K balls is:

(384,615 / 20,000) = 19.23

Meaning that if I reach into the bucket and in a single scoop remove 20,000 balls then there are 18.23 chances all the balls will be blue and 1 chance one ball will be red in the 20,000.

can we agree on that?

If not, then what is the chance of drawing a red ball in a random scoop of 20,000.

If we can just settle that for the moment then we can look at shades of red later. OK?

cheers ... Greg
This part of the question has been answers on page one in quite some detail.  Reply With Quote

4. Originally Posted by Jokodo This part of the question has been answers on page one in quite some detail.
Perhaps we crossposted? Is the post above correct?  Reply With Quote

5. Originally Posted by Grendel First of all, thanx for taking the time with me Swammerdami.

If I had a bucket with 384,614 blue balls and 1 red ball. Then in order to be 100% confident that I will draw out the red ball, then the minimum numbers of draws required that I can predict with certainty is 384,615 draws. It may be red on the first draw or the last draw.

I am not interested in that certainty.

I'm aware that in a pop of 20,000,000, (52 red) that a draw of 384,615 will not always produce a red ball, that sometimes it will produce 2, etc. The odds, or whatever you want to call them, of 384,615:1 against drawing a red ball is all I need to know. Just the probability.

If you're referring to the breakdown of that probability occurring on any particular draw (36% of the time) but over all draws redressing/averaging itself, then thats fine. No problem, doesn't matter to me.

But if there is something you're trying to get into my thickhead beyond that, then keep trying.

Greg Then why are you talking about being confident to draw a red ball? If you have 53 red balls out of 20 million with no replacement, there is a minuscule but non-zero chance that you'll draw 19,999,947 blue balls before hitting on a red one (Python on my Macbook says its 4.7464e-318, but take that with a grain of salt, I suspect floating point errors add up quite a looping over almost 20 million steps, always using the previous result as input; whatever the exact chance is, we can probably agree its hilariously low a.k.a. negligible - running with my result would mean that if every particle in the visible universe had conducted the experiment once per nanosecond since the universe was born, chances would still be way over 10^200 to 1 that it never happened).

As for two reds with the same initial and assuming you're talking Swammerdami's scenario (b), that is, we don't know how many individuals with each initial there are in your population, just that each initial has the same a priori chance of occuring, you can take Swammerdami's probabilities for 2, 3, 4,... reds and multiply them with the probability of two random individuals having the same initial (which will depend on the actual distribution of initials!), appropriately correcting for the number of possible pairings in each situation. With what seems to be the actual distribution of initials in the US, which leads to around 6.8% of two random people having the same initial, you would be summing over the probability of two people having the same initial in the scenarios where you are drawing exactly two people with your surname x times the probability of that scenario + the probability of any pairing of two having the same initial in the scenarios where you draw three X times the probability of that scenario + the probability of any pairing of 2 having the same initial in a set of four * the probability of drawing four X etc...
p0_2x = .9493041051* 0 (there are no possible pairings of two X when there are no X in your sample, thus the probability that any two X have the same initial is a priori 0)
p1_2x = .0494133528 * 0 (same here)
p2 = .0012612419 * 0.068 (with two X, there's one possible pairing)
p3 = .0000210397 * 0.190442432 (three X => 3 pairings of 2, the chance that none of those bearings have the same initial is =(1-0.068)^3, therefore the chance that at least one does 1-(1-0.068)^3, or about 19%)
p4 = .0000002580 * 0.3446165440939256 (~(1 - (1-0.068)**6), 6 pairings in a set of four)
p5 = .0000000025 * 0.5055081666228548 (~(1 - (1-0.068)**10), 10 pairings in a set of five)
...

Code:
```#!/usr/bin/python
#code for probabilities of *no* red balls after n successive draws,
# with no replacement; prone to accumulation of floating point errors,
# but should do as a first approximation

all = 20000000 # total population
red = 53 # red balls
def blue():
"""function to derive current number of blue balls, from total population and red balls"""
global all, red
return max([all-red, 0])

def prob_blue():
global all, blue
return max([0.0, float(blue())/all])

chance_no_red = 1.0 #global variable: chance of having no red balls in collection, initially 1 because before drawing any balls, you trivially won't have any *red* balls either

for i in range(19999947):
if i % 1000 == 0:
print i, chance_no_red #progress report to command line every 1000 steps
chance_no_red *= prob_blue()
all -= 1

print chance_no_red```  Reply With Quote

6. Dear Jokodo,

There is no doubt in my mind you're a very intelligent person, and thank you for all that work. But I am only a monkey with a very tiny brain.Could you just dumb it down, dumb it down, dumb it down....

Is the probability of drawing a red ball from the 20mill bucket 384,615:1

Just a yes or a no. And if no, could you just give the correct probability in the form xxxxxxxx:1

That's all I need. I don't need, can't use, the brain surgery

cheers .... Greg  Reply With Quote

7. Originally Posted by Grendel Dear Jokodo,

There is no doubt in my mind you're a very intelligent person, and thank you for all that work. But I am only a monkey with a very tiny brain.Could you just dumb it down, dumb it down, dumb it down....

Is the probability of drawing a red ball from the 20mill bucket 384,615:1
Which probability are we now talking about? The probability of drawing a red ball, the probability of drawing two red balls, or the probability of drawing two red balls of the same shade/two individuals with the same first initial, or even the probability that they'd both have the same given first initial? And if the latter - which one is it? Not all initials are equally frequent.

And is your frequency of 53 out of 20 million an actual empirical distribution in an actual population of 20 million, and you're looking for the probability to draw two of those 53, without replacement? Or is that just a way to represent the frequency in a potentially much larger population, or the priors that any individual would have that surname/any ball would be red, without knowing the actual empirical distribution? That also makes a difference. And finally, is your claim that the initials are all equally probable based on a count of those 53 individuals with surname X and their initials, or are those the priors you assume without knowing the actual distribution?

Just a yes or a no. And if no, could you just give the correct probability in the form xxxxxxxx:1
We can't be expected to answer your question when you don't seem to know what the question actually is...

The correct probability to draw at least one red ball when 53/20,000,000 is just a prior while the actual population is (for all practical purposes) infinite and/or the actual empirical distribution unknown is 1 - ((19,999,947/20,000,000)^20,000), which comes out as 0.05162. To explain: 19,999,947/20,000,000 is the probability to draw no red ball per draw, (19,999,947/20,000,000)^20,000 the probability to draw no single red ball in 20,000 consecutive draws, and one minus that number thus the probability to draw at least one red ball.

The correct probability to draw at least one red ball when 53/20,000,000 is an actual empirical distribution in a population that has exactly 20,000,000 members is 1 - 19,999,947/20,000,000 * 19,999,946/19,999,999 * 19,999,945/19,999,998 ... * 19,979,947/19,980,000. The factors 19,999,946/19,999,999 etc. are the probability to draw a red ball in a population where some (blue) individuals are already missing. This probability is marginally larger at 0.0516452 - as the number of blues goes down, your probability to fetch a red on each individual draw grows.

For the probability to draw two, it matters even more whether those 53/20,000,000 are just your priors or an actual distribution. All of this has been explained to you, with numbers, on page one of this discussion.

If you want an exact number, you first need to clarify what it is you actually mean to ask. Then you multiply the resulting probability for getting exactly two surnames X with the probability that two random individuals have the same initial, you multiply the probability for getting exactly three surnames X with the probability that at least one pairing in a group of three has matching initials, the probability of exactly 4 same surnames with the probability of at least one pair of matching initials in a set of four, etc., and when the numbers have become small enough to ignore, you add up what you've got so far.

Swammerdami has given you the tools to calculate the first factor, I have given you the tools to calculate the second factor in my previous post. There's nothing more we can do at this point. The only thing left for you to do at this point is a simple addition.  Reply With Quote

8. Thanx Jokodo,

busy with visitors here, will get back to you. Will try to work it out.

Greg  Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•