Guy <bugzilla@xxxxxxxxxxxxxxxx> wrote: > A birthday candle lasts about 2 minutes (as a guess). I think they would > light 1000 candles at the same time. Then monitor them until the first one > fails, say at 2 minutes. I think the MTBF would then be computed as 2000 > minutes MTBF! If the distribution is Poisson (i.e. the probabilty of dying per moment time is constant over time) then that is correct. I don't know offhand if that is an unbiassed estimator. I would imagine not. It would be biassed to the short side. > But we can be sure that by 2.5 minutes, at least 90% of them > would have failed. Then you would be sure that the distribution was not Poisson. What is the problem here, exactly? Many different distributions can have the same mean. For example, this one: deaths per unit time | | /\ | / \ | / \ |/ \ ---------->t and this one deaths per unit time | |\ / | \ / | \ / | \/ ---------->t have the same mean. The same mtbf. Is this a surprise ? The mean on its own is only one parameter of a distribution - for a posson distribution, it is the only parameter, but that is a particular case. For the normal disribution you require both the mean and the standard deviation in order to specify the distribution. You can get very different normal distributions with the same mean! I can't draw a Poisson distribution in ascii, but it has a short sharp rise to the peak, then a long slow decline to infinity. If you were to imagine that half the machines had died by the time the mtbf were reached, you would be very wrong! Many more have died than half. But that long tail of those very few machines that live a LOT longer than the mtbf balances it out. I already did this once for you, but I'll do it again: if the mtbf is ten years, then 10% die every year. Or 90% survive every year. This means that by the time 10 years have passed only 35% have survived (90%^10). So 2/3 of the machines have died by the time the mtbf is reached! If you want to know where the peak of the death rate occurs, well, it looks to me as though it is at the mtbf (but I am calculating mentally, not on paper, so do your own checks). After that deaths become less frequent in the population as a whole. To estimate the mtbf, I would imagine that one averages the proportion of the population that die per month, for several months. But I guess serious appicative statisticians have evolved far more sophisticated and more efficient estimators. And then there is the problem that the distribution is bipolar, not pure poisson. There will be a subpopulation of faulty disks that die off earlier. So they need to discount early measurements in favour of the later ones (bad luck if you get one of the subpopulation of defectives :) - but that's what their return policy is for). Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html