risk mitigation in 2 replica clusters

Blair Bethwaite <blair.bethwaite@xxxxxxxxx> · Thu, 22 Jun 2017 00:51:38 +1000

Hi all,
I'm doing some work to evaluate the risks involved in running 2r storage pools. On the face of it my naive disk failure calculations give me 4-5 nines for a 2r pool of 100 OSDs (no copyset awareness, i.e., secondary disk failure based purely on chance of any 1 of the remaining 99 OSDs failing within recovery time). 5 nines is just fine for our purposes, but of course multiple disk failures are only part of the story.

The more problematic issue with 2r clusters is that any time you do planned maintenance (our clusters spend much more time degraded because of regular upkeep than because of real failures) you're suddenly drastically increasing the risk of data-loss. So I find myself wondering if there is a way to tell Ceph I want an extra replica created for a particular PG or set thereof, e.g., something that would enable the functional equivalent of: "this OSD/node is going to go offline so please create a 3rd replica in every PG it is participating in before we shutdown that/those OSD/s"...?

-- 
Cheers,
~Blairo

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com