On 2017-06-22T00:51:38, Blair Bethwaite <blair.bethwaite@xxxxxxxxx> wrote: > I'm doing some work to evaluate the risks involved in running 2r storage > pools. On the face of it my naive disk failure calculations give me 4-5 > nines for a 2r pool of 100 OSDs (no copyset awareness, i.e., secondary disk > failure based purely on chance of any 1 of the remaining 99 OSDs failing > within recovery time). 5 nines is just fine for our purposes, but of course > multiple disk failures are only part of the story. You are confounding availability with data durability, too. "Traditional" multi-node replicated storage solutions can get away with only two nodes to mirrot the data inbetween because they typically have an additional RAID5/6 at the local node level. (Which also helps with recovery impact of a single device failure.) Ceph typically doesn't. (That's why rbd-mirror between two Ceph clusters can be OK too.) A disk failing while a node is down, or being rebooted, ... > thereof, e.g., something that would enable the functional equivalent of: > "this OSD/node is going to go offline so please create a 3rd replica in > every PG it is participating in before we shutdown that/those OSD/s"...? You can evacuate the node by setting it's weight to 0. It's a very expensive operation, just like your proposal would be. Regards, Lars -- Architect SDS SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com