On Fri, 7 Mar 2014, Dan van der Ster wrote: > On Thu, Mar 6, 2014 at 9:30 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > > Sheldon just > > pointed out a talk from ATC that discusses the basic problem: > > > > https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon > > > > The situation with CRUSH is slightly better, I think, because the number > > of peers for a given OSD in a large cluster is bounded (pg_num / > > num_osds), but I think we may still be able improve things. > > I'm surprised they didn't cite Ceph -- aren't copysets ~= placement groups? I think so (I didn't listen to the whole talk :). My ears did perk up when Carlos (who was part of the original team at UCSC) asked the question about the CRUSH paper at the end, though. :) Anyway, now I'm thinking that this *is* really just all about tuning pg_num/pgp_num. And of course managing failure domains in the CRUSH map as best we can to align placement with expected sources of correlated failure. But again, I would appreciate any confirmation from others' intuitions or (better yet) a proper mathematical model. This bit of my brain is full of cobwebs, and wasn't particularly strong here to begin with. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html