Re: RADOS translator for GlusterFS

Samuel Just <sam.just@xxxxxxxxxxx> · Mon, 5 May 2014 11:23:50 -0700



We do essentially lock entire objects for many purposes.  This isn't
generally a problem (and greatly simplifies many bits of the
implementation) because all existing rados users employ some form of
chunking/striping.  That said, it's probably a good thing to punt on
for a prototype.
-Sam

On Mon, May 5, 2014 at 11:07 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote:
>> It's very important, several kinds of blocking are done at object
>> granularity.  Off the top of my head, large objects would cause deep
>> scrub and recovery to stall requests for longer.  Elephant objects
>> would also be able to skew data distribution.
>
> There are some definite parallels here to discussions we've had in
> Gluster-land, which we might as well go through because people from
> either "parent" won't have heard the other.  The data distribution
> issue has turned out to be a practical non-issue for GlusterFS
> users.  Sure, if you have very few "elephant objects" on very few
> small-ish bricks (our equivalent of OSDs) then you can get skewed
> distribution.  On the other hand, that problem *very* quickly
> solves itself for even moderate object and brick counts, to the
> point that almost no users have found it useful to enable striping.
> Has your experience been different, or do you not know because
> striping is mandatory instead of optional?
>
> The "deep scrub and recovery" point brings up a whole different
> set of memories.  We used to have a problem in GlusterFS where
> self-heal would lock an entire file while it ran, so other access
> to that file would be blocked for a long time.  This would cause
> VMs to hang, for example.  In either 3.3 or 3.4 (can't remember)
> we added "granular self-heal" which would only lock the portion
> of the file that was currently under repair, in a sort of rolling
> fashion.  From your comment, it sounds like RADOS still locks the
> entire object.  Is that correct?  If so, I posit that it's
> something we wouldn't need to solve in a prototype.  If/when that
> starts turning into something real, then we'd have two options.
> One is to do striping as you suggest, which means solving all of
> the associated coordination problems.  Another would be to do
> something like what GlusterFS did, with locking at the sub-object
> level.  That does make repair less atomic, which some would
> consider a consistency problem, but we do have some evidence that
> it's a violation users don't seem to care about.
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html