Hello, On Thu, 9 Jun 2016 09:59:04 +0200 Gandalf Corvotempesta wrote: > 2016-06-09 9:16 GMT+02:00 Christian Balzer <chibi@xxxxxxx>: > > Neither, a journal failure is lethal for the OSD involved and unless > > you have LOTS of money RAID1 SSDs are a waste. > > Ok, so if a journal failure is lethal, ceph automatically remove the > affected OSD > and start rebalance, right ? > Correct. > > Additionally your cluster should (NEEDS to) be designed to handle the > > loss of a journal SSD and its associated OSDs, since that is less than > > a whole node, or a whole rack (whatever your failure domain may be). > > What do you suggest about this? In the (small) cluster i'm trying to > plan, I would like to be protected on every component up to the whole > rack. I have 2 different racks for the storage, so data should be spread > across both and still keep the single OSD/Journal failure as failure > domain > Define "small" cluster. Your smallest failure domain both in Ceph (CRUSH rules) and for calculating how much over-provisioning you need should always be the node/host. This is the default CRUSH rule for a reason. It's trivial to then create a CRUSH rule to spread things between racks (first pick a rack, then a node). Of course if you have 2 switches per rack, dual links and dual power supplies feed by independent PDUs on all your gear, a "rack" failure domain becomes pretty cosmetic and superfluous. Christian > Yes, reading docs should answer to many questions (and I'm reading), but > having a mailing list where expert people reply is much better. > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com