We do essentially lock entire objects for many purposes. This isn't generally a problem (and greatly simplifies many bits of the implementation) because all existing rados users employ some form of chunking/striping. That said, it's probably a good thing to punt on for a prototype. -Sam On Mon, May 5, 2014 at 11:07 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote: >> It's very important, several kinds of blocking are done at object >> granularity. Off the top of my head, large objects would cause deep >> scrub and recovery to stall requests for longer. Elephant objects >> would also be able to skew data distribution. > > There are some definite parallels here to discussions we've had in > Gluster-land, which we might as well go through because people from > either "parent" won't have heard the other. The data distribution > issue has turned out to be a practical non-issue for GlusterFS > users. Sure, if you have very few "elephant objects" on very few > small-ish bricks (our equivalent of OSDs) then you can get skewed > distribution. On the other hand, that problem *very* quickly > solves itself for even moderate object and brick counts, to the > point that almost no users have found it useful to enable striping. > Has your experience been different, or do you not know because > striping is mandatory instead of optional? > > The "deep scrub and recovery" point brings up a whole different > set of memories. We used to have a problem in GlusterFS where > self-heal would lock an entire file while it ran, so other access > to that file would be blocked for a long time. This would cause > VMs to hang, for example. In either 3.3 or 3.4 (can't remember) > we added "granular self-heal" which would only lock the portion > of the file that was currently under repair, in a sort of rolling > fashion. From your comment, it sounds like RADOS still locks the > entire object. Is that correct? If so, I posit that it's > something we wouldn't need to solve in a prototype. If/when that > starts turning into something real, then we'd have two options. > One is to do striping as you suggest, which means solving all of > the associated coordination problems. Another would be to do > something like what GlusterFS did, with locking at the sub-object > level. That does make repair less atomic, which some would > consider a consistency problem, but we do have some evidence that > it's a violation users don't seem to care about. > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html