Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does someone mind posting the math on this? I'm getting something like 775% chance it would happen yearly, but that seems too likely, and I think I screwed mine up somewhere, and wouldn't mind seeing where I made some wrong assumption in how to compute the probabilities.


> I'm getting something like 775% chance it would happen yearly, but that seems too likely,

In probability, when you roam outside of the 0-1 (or 0-100%) range, you can guarantee with 775% certainty that something went wrong somewhere. ;)


That’s not necessarily accurate if you are already operating in probability per year in this case. It just means 7.75 occurrences per time.


You can have an expected value E[X] = 7.75

You cannot have a probability P[X ≥ 1] >1.

There is no 'probability per year' (if it were meaningful year would have to be <1 for a >1 result anyway) - that time frame is built into 'X', it's 'inside' the probability.


If they are independent, then there should be a log-probability of it not happening per year though, right? Like, if the chance for each year is the same and they are independent, then the log of the probability that it doesn’t happen in a given year, If you multiply that that by the number of years, that will give the log of the probability that it doesn’t happen in any of the years. This seems like it could serve many of the purposes that people are intuitively looking for when they think of “probability per year”. And if the different years aren’t independent, uh, maybe some terms could be added to adjust for that?


But OP didn't use either of those terms, they used "775% chance per year", which you can interpret colloquially as a speaker of English.

It's been clear throughout this thread what was meant.


> "chance per year", which you can interpret

Probably something like reciprocal half-life; 775% / yr is about 1 / 1.5 month ie, happens about once every month and a half.


That would be the expected value E(X), not the probability P(X > 0). The two are not conflatable.


Nope. Bad maths. Average of 7.75 per year !== >100% of it happening at least once in one year.

There’s still some chance it doesn’t happen in one year.


Probability it happens is 1 in 50c5 (the number of ways to choose 5 balls out of 50.) So probability it doesn't happen is 1-1/50c5. Probability it never happens in a year is that number raised to (3651000). That gives 85%, so there's a 1-.85 = 15% chance that it does happen each year:

https://www.wolframalpha.com/input/?i=%281-1%2F%2850choose5%...


This analysis is spot-on under the assumption of 1000 lotteries per day, although your asterisks are getting trashed by the markdown.

To the other folks ending up with some wild results, there is a basic checksum on probability: If you compute the probability of an event happening at greater than 100% you've borked something up.

Bug: They drew 5 balls from one pool of 50 with an independent draw from another smaller pool of 20, so you need ~~50 ncr 6~~ (50 ncr 5) * 20, not 50 ncr 5.

Nit: I would rephrase your answer that there is a 0.85% chance that it happens one or more times in any given year. There remains a (vanishingly small) chance that it happened on every single random draw during the year.


Wait. Why is it 50c5? If you want to know how many ways you can roll triples with 3 dice, it isn't 6c3, right?

There's 6 possible triple dice combinations and there are 6^3 possible dice combinations, so it's 6/(6^3).

If you wanted to know the odds of rolling 3 dice in order, you could roll: 1, 2, 3 OR 2, 3, 4 OR 3, 4, 5 OR 4, 5, 6 - which is 4/(6^3) - which is not 6c3.

Why is it different with the lottery? Or did I get the dice wrong?

Or are you calculating that the balls can be drawn in any order?


Balls drawn in any order, without replacement.

The probability of guessing all 6 balls in a single lottery is 1 in (50 ncr 6). So, the probability of losing is 1 - 1/(50 ncr 6). The probability of losing every time is (1 - 1/(50 ncr 6)^(n_games), where n_games = 365 * 1000. Therefore, the probability of winning at least one game is (1 - (1 - 1/(50 ncr 6)^(n_games)).


Got it. 50c6 is the total combinations. But there's more than 1 combination of 6 ascending balls, right? Why is it 1 in (50c6) instead of 45 in (50c6)?


Oh, sure. There's a number of different suspect or convenient sequences out there. All of the evenly-spaced sequences could be considered suspicious if you go broadly enough. A detail I tried to add back in up-thread: The sixth ball is from a separate pool of 20 balls. So 50c5 * 20 is the total number of possible draws, and there are 14 directly in-order sequences.

But the main point was the methodology. (1 - (1 - chance_of_sequence)^(n_draws)).


Thanks! Why are there only 14 in-order sequences??

Couldn't there be 1 2 3 4 5 6 AND 2 3 4 5 6 7 AND 3 4 5... Doesn't this give you 45?


As the person above said, one of the balls is restricted to only 1-20. This ball is drawn last.

You get 14 possible draws if the order the balls are drawn in matters (14-20 being the highest, with 20 drawn last), and 20 permutations if it does not (since any straight above 20-25 is not possible). The math is much different for drawing if the order matters though. There also would probably be some funky stuff going on for the higher straights where order doesn’t matter, since for a 20-25 straight the 20 ball must be the special ball. For a 19-24 straight either 19 or 20 must be, etc. Really you’re looking at calculating “Chance that the first five balls can create a straight with a number between 1-20, and then a 1/20 chance that straight actually happens.”


Sure, sorry. I'd used:

  2*(1/49)*(1/48)*(1/47)*(1/46)*365*1000*(45/50) = 12.9%
The initial "2" term is for consecutive numbers going both directions, and the final (45/50) term is to account for the fact that if you start with 4 or less and decreasing, or 46 or greater and increasing, you'll run out of numbers.

Edit: but if the numbers don't have to be drawn in order (e.g. 8-5-9-6-7 is OK), then the odds are much higher still:

  2*(4/49)*(3/48)*(2/47)*(1/46)*365*1000 = 344.5%
(With the initial "2" term accounting for 4 consecutive numbers on either side of the initial pick -- though I'm not sure I've got that entirely right?) Then it would happen three to four times a year. Even with a 6th ball drawn separately out of 20, that's still a 17% chance happening somewhere in the world in a year, given 1,000 daily draws.


> Edit: but if the numbers don't have to be drawn in order (e.g. 8-5-9-6-7 is OK), then the odds are much higher still:

Most lotteries are order-insensitive, and typically present the results in ascending numeric order. The actual draw often happens in some order (e.g. numbered balls being drawn from a hopper), and it'd be even more unusual if the numbers were actually drawn in consecutive order, but drawing 5-2-3-1-4 would typically be presented as 1-2-3-4-5 and would still be remarked upon as unusual.


To calculate the probability of some event with probability P coming true at least once out of N total tries, you do 1 - ((1 - P) ^ N), not P * N as you've done.

Doing this, your final figures should be 12.1% and 96.8%.

For the expected number of times it would happen though, you are right, it would be three or four times per year on average for the second case.


Oh. I assumed consecutive numbers, but not necessarily consecutive on each pull. That is, 1,2,3,4,5 is no different than 2,3,4,5,1 in my calculations, which makes the possible ways got get consecutive numbers at the end quite a bit higher. Still not positive I got it right, but at least I know it shouldn't match your results, as they are for slightly different things. :)

Edit: What I did was take all the possible combinations of 5 balls (5!) by the number of different sets that could be drawn (50-5, based on lowest number), over the total possible draws (50!). I think perhaps what that does it not account for overlap between sets (1-5 and 2-6), inflating the number somewhat, which is why it's a bit more than twice the probability than you got for any possible sequence of 5.


Unless I'm doing something wrong here - just typing some javascript into the console gives me...

  function rn() {
      return parseInt(Math.random() * 100);
  }


  function drawing(){
      let result = [rn(), rn(), rn(), rn(), rn()]; 
      return result.sort();
  };


  function sequential(arr){
      for(let i = 0; i < arr.length-1; i++){
          if(arr[i] + 1 != arr[i+1]){
              return false
          } 
      } 
      return true
  }

  let counter = 0; 

  for (let i = 0; i < 10000000; i ++){
      if (sequential(drawing())){
          counter ++;
      } 
  }; 

  console.log(counter);

  16
Mathematically, it seems like you'd need to draw one of five numbers from the range, then one of four numbers, then one of three... so the likelihood would be 5/100 * 4/100 * 3/100 * 2/100 * 1/100 = 0.000000012. Although those odds don't seem to line up with the javascript I posted.


FYI 0.129 = 1 / 7.75. I think you and person you're responding to are doing the same calculations, but inverted.


Lumping all world lotteries together introduces a lot of handwaving, so let's start with what we know and work forward.

According to the article, this event happened for South Africa Powerball, where 5 numbers are chosen out of 45, and 1 number chosen out of 20: https://en.wikipedia.org/wiki/South_African_National_Lottery...

That same Wikipedia page does some of the math for us: The chance of one combination being chosen is 1/42,375,200. So if we count all possible sequential combinations, N, we'll know that the chance of a single winning combination being sequential is N/42,375,200.

Say the powerball comes out as any number 6 <= M <= 20. There are 6 ways the numbers 1-45 could be picked such that M is part of the sequence. That's 90 ways total. If the powerball is 5, there are only 5 ways, same continuing down to a powerball of 1 where there's only 1 combination of numbers 1-45 where it could be part of the sequence. 90 + 5 + 4 + 3 + 2 + 1 brings us to 105 as our N.

So this single event had a probability of 105/42,375,200, or 1/403,573. This means that for similar lotteries one would expect to see a sequence after about 200,000 picks.

EDIT: If you only count events where the powerball is the high number, as happened this time, N goes down to 15, making the odds 1/2,825,013, so one would expect such a sequence after about 1.4 million picks.


That seems right, 1 * 1/49 * 1/47 * 1/46 * 1/45 * 5! * 1000 * 365 gets about 8 occurrences per year.


There are two very different questions:

1. How often does this happen? This is a question about expected value, and the answer could be anything zero or above.

2. What are the chances that this will happen within a year? This is a question about probability, and the answer must lie between zero and one. There is no such thing as "a 775% chance it would happen yearly".


Those two questions are closely related here by a very simple transformation: if the expected number of occurrences is N over many independent tries, then probability of 0 occurrences is approximately 1-e^(-N), or 99.96% if N=7.75.

Note that for N close to 0, 1-N is also a good approximation to 1-e^(-N).

For large N, it's generally more convenient to talk about the expectation rather than the probability of 0 hits—I'm sure many readers implicitly converted 775% to the expectation in their heads.


> I'm sure many readers implicitly converted 775% to the expectation in their heads.

Most people cannot do this correctly; the most obvious interpretation of a "775% chance" is that it represents a 25% chance of seven occurrences and a 75% chance of eight occurrences, with no other possibilities.

The problem gets even worse when you have expectations less than one. If the expected number of occurrences is 80%, what are the odds of getting any occurrences at all? They're less than 80% as long as it's possible to have more than one occurrence.


you skipped 1/48


Probability is a 0-1 range




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: