(no subject)
Apr. 22nd, 2009 10:21 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
More ramblings on a topic of interest only to me. In this entry, firming up our model of the downloading process.
Last time, I decided that the hypergeometric distribution most closely matched the model of what I was trying to learn about. In short, a hypergeometric distribution is what you get when you do random-sampling-without-replacement. In other words, you take samples from a population and DO NOT PUT THEM BACK IN THE POPULATION before taking the next sample.
A real-world example (which I used last time and which we will be revisiting in more detail this time around) of this kind of sampling is drawing cards from a deck to get two kings. Each card you draw, you put off to the side. Now, the first card you draw has a 4 in 52 chance of being a king. Makes sense, right? There are four kings in the deck and there are 52 cards total. However, your SECOND draw has different odds. If the first card you drew was a king, then the next card you draw has a 3 in 51 chance of being a king. If the first card you drew was NOT a king, then the next card you draw has a 4 in 51 chance of being king. With sampling-without-replacement, the more non-king draws you make, the better the odds get that your *next* draw will be kingly. (If you took the card you drew the first time, shoved it into the middle of the deck, gave the whole deck a good shuffle or two, and THEN drew for a second time, that would be sampling-with-replacement and your odds of getting a king on any given draw would be 4/52. But we're not doing that kind of thing.)
We know how decks of cards are constructed. There are four suits with 13 cards in each suit for a total of 52 cards. Prior to this little adventure, I didn't know much about how bittorrent files were deconstructed for torrenting. However, I've since read a wikipedia article on the protocol and now I'm an expert. (<-- sarcasm! I used to didn't figure I had to lable this stuff but apparently I don't write for meaning as well as I used to think I did.) Files for torrenting are broken down into little pieces, most commonly 250 Kb, 512 Kb, or 1 Mb. I do not know how big the piece size is for the file set I am downloading because I'm using a bare-bones client that gives me better throughput at the expense of detailed statistics or cool charts. Let's give it the benefit of the doubt and assume 1Mb pieces. The average size of a given episode is 350Mb so the revised model gives us 350 pieces in each file, not 100.
Also, previously, I assumed that I needed the entire file for it to be watchable. This is not exactly the case. Depending on the robustness of your player and your willingness to tolerate stalls, skips, and disjoints in your viewing material, a less-than-complete file can be viewed with aplomb. I've had passable luck putting 99% complete stuff into VLC and getting it to play at a watchable level. So, in reality, we need 99% of 350 pieces in order to have a functional viewing experience. Gentlemen, this changes everything! Or not. (It does improve our odds of getting something watchable in a reasonable timeframe but it's late and I'm too tired to play with this question tonight.)
I'd like to know how changing from 100 pieces per episode to 350 pieces per episode affects the odds. I suspect it's going to make them a lot worse.
However, given the bloody alarming size of the numbers that we got in the first go-round, I think we're going to solve this by analogy. We are going to construct a smaller, more easy problem that *resembles* the problem at hand, kind of like doing Similar Triangles shit in geometry.
Because I've already worked this out in the comments on the previous statistics post, changing the number of desired items from 4 wanted cards (say, all the kings) to 13 wanted cards (say, all the clubs) changes the probability of success in 44 draws from .5 (fifty-fifty odds) for the 4 to .082 (shitty odds) for the 13. Therefore, changing the number of file pieces from 100 to a more-realistic 350 is going to make the odds, well, suck more harder. They're not going to improve. *sigh*
On the plus side, the torrent client says "about 25 days" now. Things are looking up.
Last time, I decided that the hypergeometric distribution most closely matched the model of what I was trying to learn about. In short, a hypergeometric distribution is what you get when you do random-sampling-without-replacement. In other words, you take samples from a population and DO NOT PUT THEM BACK IN THE POPULATION before taking the next sample.
A real-world example (which I used last time and which we will be revisiting in more detail this time around) of this kind of sampling is drawing cards from a deck to get two kings. Each card you draw, you put off to the side. Now, the first card you draw has a 4 in 52 chance of being a king. Makes sense, right? There are four kings in the deck and there are 52 cards total. However, your SECOND draw has different odds. If the first card you drew was a king, then the next card you draw has a 3 in 51 chance of being a king. If the first card you drew was NOT a king, then the next card you draw has a 4 in 51 chance of being king. With sampling-without-replacement, the more non-king draws you make, the better the odds get that your *next* draw will be kingly. (If you took the card you drew the first time, shoved it into the middle of the deck, gave the whole deck a good shuffle or two, and THEN drew for a second time, that would be sampling-with-replacement and your odds of getting a king on any given draw would be 4/52. But we're not doing that kind of thing.)
We know how decks of cards are constructed. There are four suits with 13 cards in each suit for a total of 52 cards. Prior to this little adventure, I didn't know much about how bittorrent files were deconstructed for torrenting. However, I've since read a wikipedia article on the protocol and now I'm an expert. (<-- sarcasm! I used to didn't figure I had to lable this stuff but apparently I don't write for meaning as well as I used to think I did.) Files for torrenting are broken down into little pieces, most commonly 250 Kb, 512 Kb, or 1 Mb. I do not know how big the piece size is for the file set I am downloading because I'm using a bare-bones client that gives me better throughput at the expense of detailed statistics or cool charts. Let's give it the benefit of the doubt and assume 1Mb pieces. The average size of a given episode is 350Mb so the revised model gives us 350 pieces in each file, not 100.
Also, previously, I assumed that I needed the entire file for it to be watchable. This is not exactly the case. Depending on the robustness of your player and your willingness to tolerate stalls, skips, and disjoints in your viewing material, a less-than-complete file can be viewed with aplomb. I've had passable luck putting 99% complete stuff into VLC and getting it to play at a watchable level. So, in reality, we need 99% of 350 pieces in order to have a functional viewing experience. Gentlemen, this changes everything! Or not. (It does improve our odds of getting something watchable in a reasonable timeframe but it's late and I'm too tired to play with this question tonight.)
I'd like to know how changing from 100 pieces per episode to 350 pieces per episode affects the odds. I suspect it's going to make them a lot worse.
However, given the bloody alarming size of the numbers that we got in the first go-round, I think we're going to solve this by analogy. We are going to construct a smaller, more easy problem that *resembles* the problem at hand, kind of like doing Similar Triangles shit in geometry.
Because I've already worked this out in the comments on the previous statistics post, changing the number of desired items from 4 wanted cards (say, all the kings) to 13 wanted cards (say, all the clubs) changes the probability of success in 44 draws from .5 (fifty-fifty odds) for the 4 to .082 (shitty odds) for the 13. Therefore, changing the number of file pieces from 100 to a more-realistic 350 is going to make the odds, well, suck more harder. They're not going to improve. *sigh*
On the plus side, the torrent client says "about 25 days" now. Things are looking up.