During the CRUSH CDS session yesterday I talked a bit about the desire to constrain the number of possible disk combinations so that we reduce the probability of a concurrent failure from causing data loss. Sheldon just pointed out a talk from ATC that discusses the basic problem: https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon The situation with CRUSH is slightly better, I think, because the number of peers for a given OSD in a large cluster is bounded (pg_num / num_osds), but I think we may still be able improve things. Last night it occurred to me that this is almost just having pgp_num < pg_num, but I think that's not quite right either. If anyone has some clear intuition here, would love to hear it. If there is anything we can do to improve things we definitely want to do it! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html