Re: risk mitigation in 2 replica clusters

Lars Marowsky-Bree <lmb@xxxxxxxx> · Thu, 29 Jun 2017 15:55:40 +0200

On 2017-06-22T00:51:38, Blair Bethwaite <blair.bethwaite@xxxxxxxxx> wrote:

> I'm doing some work to evaluate the risks involved in running 2r storage
> pools. On the face of it my naive disk failure calculations give me 4-5
> nines for a 2r pool of 100 OSDs (no copyset awareness, i.e., secondary disk
> failure based purely on chance of any 1 of the remaining 99 OSDs failing
> within recovery time). 5 nines is just fine for our purposes, but of course
> multiple disk failures are only part of the story.

You are confounding availability with data durability, too.

"Traditional" multi-node replicated storage solutions can get away with
only two nodes to mirrot the data inbetween because they typically have
an additional RAID5/6 at the local node level. (Which also helps with
recovery impact of a single device failure.) Ceph typically doesn't.

(That's why rbd-mirror between two Ceph clusters can be OK too.)

A disk failing while a node is down, or being rebooted, ...

> thereof, e.g., something that would enable the functional equivalent of:
> "this OSD/node is going to go offline so please create a 3rd replica in
> every PG it is participating in before we shutdown that/those OSD/s"...?

You can evacuate the node by setting it's weight to 0. It's a very
expensive operation, just like your proposal would be.

Regards,
    Lars

-- 
Architect SDS
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com