> It's very important, several kinds of blocking are done at object > granularity. Off the top of my head, large objects would cause deep > scrub and recovery to stall requests for longer. Elephant objects > would also be able to skew data distribution. There are some definite parallels here to discussions we've had in Gluster-land, which we might as well go through because people from either "parent" won't have heard the other. The data distribution issue has turned out to be a practical non-issue for GlusterFS users. Sure, if you have very few "elephant objects" on very few small-ish bricks (our equivalent of OSDs) then you can get skewed distribution. On the other hand, that problem *very* quickly solves itself for even moderate object and brick counts, to the point that almost no users have found it useful to enable striping. Has your experience been different, or do you not know because striping is mandatory instead of optional? The "deep scrub and recovery" point brings up a whole different set of memories. We used to have a problem in GlusterFS where self-heal would lock an entire file while it ran, so other access to that file would be blocked for a long time. This would cause VMs to hang, for example. In either 3.3 or 3.4 (can't remember) we added "granular self-heal" which would only lock the portion of the file that was currently under repair, in a sort of rolling fashion. From your comment, it sounds like RADOS still locks the entire object. Is that correct? If so, I posit that it's something we wouldn't need to solve in a prototype. If/when that starts turning into something real, then we'd have two options. One is to do striping as you suggest, which means solving all of the associated coordination problems. Another would be to do something like what GlusterFS did, with locking at the sub-object level. That does make repair less atomic, which some would consider a consistency problem, but we do have some evidence that it's a violation users don't seem to care about. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html