Hi, I am trying to wrap my head around large RBD-on-RADOS clusters and their reliability and would love some community feedback. Firstly, for the RADOS-only case, reliability for a single object should be (only looking at node failures, assuming a MTTR of 1 day and a node MTBF of 20,000h (~2.3 years)): MTBF 20,000h == annualized failure rate of ~32%, broken down to a daily that means every day there is a ~0,09% chance for a single node to break down (assuming simplistically that daily failure rate = AFR/365) My chance of losing all object-holding nodes at the same time for the single object case is DFR^(number of replica), so: # rep # prob. of total system failure 1 0,089033220% 2 0,000079269% 3 0,000000071% 4 0,00000000006% (though I think I need to take the number of nodes into question as well - the more nodes, the less likely it becomes that the single object peer nodes will crash simultaneously) that means even on hardware that has a high chance of failure, my single objects (when using 3 replica) should be fine - unsurprisingly, seeing as this is one of the design goals for RADOS. Now, let's take RBD into play. Using sufficiently large disks (assumed 10TB RBD disksize) and the default block size of 4MB, on a 10% filled disk (1TB written) we end up with 1TB/4MB = 250,000 objects. That means that every ceph OSD node participating in that disk's RBD pool has parts of this disk, so every OSD node failure means that this disk (and actually, all RBD disks since pretty much all of the RBD disks will have objects on every node) is now at risk of having blocks lost - my gut tells me there is a much higher risk of data loss for the RBD case vs the single object case, but maybe I am mistaken? Can one of you enlighten me with some probability calculation magic? Probably best to start with plain RADOS, then move into RBD territory. My fear is that really large (3000+ nodes) RBD clusters will become too risky to run, and I would love for someone to dispel my fear with math ;) Kind regards, Felix -- Felix Schüren Senior Infrastructure Architect Host Europe Group - http://www.hosteuropegroup.com/ Mail: felix.schueren@xxxxxxxxxxxxxxxxxxx Tel: +49 2203 1045 7350 Mobile: +49 162 2323 988 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com