On 10/04/2019 18.11, Christian Balzer wrote: > Another thing that crossed my mind aside from failure probabilities caused > by actual HDDs dying is of course the little detail that most Ceph > installations will have have WAL/DB (journal) on SSDs, the most typical > ratio being 1:4. > And given the current thread about compaction killing pure HDD OSDs, > something you may _have_ to do. > > So if you get unlucky and a SSD dies 4 OSDs are irrecoverably lost, unlike > a dead node that can be recovered. > Combine that with the background noise of HDDs failing, things got just > quite a bit scarier. Certainly, your failure domain should be at least host, and that changes the math (even without considering whole-host failure). Let's say you have 375 hosts and 4 OSDs per host, with the failure domain correctly set to host. Same 50000 pool PGs as before. Now if 3 hosts die: 50000 / (375 choose 3) =~ 0.57% chance of data loss This is equivalent to having 3 shared SSDs die. If 3 random OSDs die in different hosts, the chances of data loss would be 0.57% / (4^3) =~ 0.00896 % (1 in 4 chance per host that you hit the OSD a PG actually lives in, and you need to hit all 3). This is marginally higher than the ~ 0.00891% with uniformly distributed PGs, because you've eliminated all sets of OSDs which share a host. -- Hector Martin (hector@xxxxxxxxxxxxxx) Public Key: https://mrcn.st/pub _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com