On Wed, Apr 10, 2019 at 11:12 AM Christian Balzer <chibi@xxxxxxx> wrote: > > > Hello, > > Another thing that crossed my mind aside from failure probabilities caused > by actual HDDs dying is of course the little detail that most Ceph > installations will have have WAL/DB (journal) on SSDs, the most typical > ratio being 1:4. Unfortunately the ratios seen "in the wild" seems to be a lot higher. I've seen 1:100 and 1:60 which is a obviously a really bad idea. But 1:24 is also quite common. 1:12 is quite common: 2 NVMe disks in 24 bay chassis. I think that's perfectly reasonable. Paul > And given the current thread about compaction killing pure HDD OSDs, > something you may _have_ to do. > > So if you get unlucky and a SSD dies 4 OSDs are irrecoverably lost, unlike > a dead node that can be recovered. > Combine that with the background noise of HDDs failing, things got just > quite a bit scarier. > > And if you have a "crap firmware of the week" situation like experienced > with several people here, you're even more like to wind up in trouble very > fast. > > This is of course all something people do (or should know), I'm more > wondering how to model it to correctly asses risks. > > Christian > > On Wed, 3 Apr 2019 10:28:09 +0900 Christian Balzer wrote: > > > On Tue, 2 Apr 2019 19:04:28 +0900 Hector Martin wrote: > > > > > On 02/04/2019 18.27, Christian Balzer wrote: > > > > I did a quick peek at my test cluster (20 OSDs, 5 hosts) and a replica 2 > > > > pool with 1024 PGs. > > > > > > (20 choose 2) is 190, so you're never going to have more than that many > > > unique sets of OSDs. > > > > > And this is why one shouldn't send mails when in a rush, w/o fully groking > > the math one was just given. > > Thanks for setting me straight. > > > > > I just looked at the OSD distribution for a replica 3 pool across 48 > > > OSDs with 4096 PGs that I have and the result is reasonable. There are > > > 3782 unique OSD tuples, out of (48 choose 3) = 17296 options. Since this > > > is a random process, due to the birthday paradox, some duplicates are > > > expected after only the order of 17296^0.5 = ~131 PGs; at 4096 PGs > > > having 3782 unique choices seems to pass the gut feeling test. Too lazy > > > to do the math closed form, but here's a quick simulation: > > > > > > >>> len(set(random.randrange(17296) for i in range(4096))) > > > 3671 > > > > > > So I'm actually slightly ahead. > > > > > > At the numbers in my previous example (1500 OSDs, 50k pool PGs), > > > statistically you should get something like ~3 collisions on average, so > > > negligible. > > > > > Sounds promising. > > > > > > Another thing to look at here is of course critical period and disk > > > > failure probabilities, these guys explain the logic behind their > > > > calculator, would be delighted if you could have a peek and comment. > > > > > > > > https://www.memset.com/support/resources/raid-calculator/ > > > > > > I'll take a look tonight :) > > > > > Thanks, a look at the Backblaze disk failure rates (picking the worst > > ones) gives a good insight into real life probabilities, too. > > https://www.backblaze.com/blog/hard-drive-stats-for-2018/ > > If we go with 2%/year, that's an average failure ever 12 days. > > > > Aside from how likely the actual failure rate is, another concern of > > course is extended periods of the cluster being unhealthy, with certain > > versions there was that "mon map will grow indefinitely" issue, other more > > subtle ones might lurk still. > > > > Christian > > > -- > > > Hector Martin (hector@xxxxxxxxxxxxxx) > > > Public Key: https://mrcn.st/pub > > > > > > > > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Rakuten Communications > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Rakuten Communications > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com