Re: Disk failures

Christian Balzer <chibi@xxxxxxx> · Thu, 9 Jun 2016 17:28:20 +0900

Hello,

On Thu, 9 Jun 2016 09:59:04 +0200 Gandalf Corvotempesta wrote:

> 2016-06-09 9:16 GMT+02:00 Christian Balzer <chibi@xxxxxxx>:
> > Neither, a journal failure is lethal for the OSD involved and unless
> > you have LOTS of money RAID1 SSDs are a waste.
> 
> Ok, so if a journal failure is lethal, ceph automatically remove the
> affected OSD
> and start rebalance, right ?
> 
Correct.

> > Additionally your cluster should (NEEDS to) be designed to handle the
> > loss of a journal SSD and its associated OSDs, since that is less than
> > a whole node, or a whole rack (whatever your failure domain may be).
> 
> What do you suggest about this? In the (small) cluster i'm trying to
> plan, I would like to be protected on every component up to the whole
> rack. I have 2 different racks for the storage, so data should be spread
> across both and still keep the single OSD/Journal failure as failure
> domain
> 

Define "small" cluster.

Your smallest failure domain both in Ceph (CRUSH rules) and for
calculating how much over-provisioning you need should always be the
node/host. 
This is the default CRUSH rule for a reason.

It's trivial to then create a CRUSH rule to spread things between racks
(first pick a rack, then a node).

Of course if you have 2 switches per rack, dual links and dual power
supplies feed by independent PDUs on all your gear, a "rack" failure
domain becomes pretty cosmetic and superfluous.

Christian

> Yes, reading docs should answer to many questions (and I'm reading), but
> having a mailing list where expert people reply is much better.
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com