Re: RADOS translator for GlusterFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> It's very important, several kinds of blocking are done at object
> granularity.  Off the top of my head, large objects would cause deep
> scrub and recovery to stall requests for longer.  Elephant objects
> would also be able to skew data distribution.

There are some definite parallels here to discussions we've had in
Gluster-land, which we might as well go through because people from
either "parent" won't have heard the other.  The data distribution
issue has turned out to be a practical non-issue for GlusterFS
users.  Sure, if you have very few "elephant objects" on very few
small-ish bricks (our equivalent of OSDs) then you can get skewed
distribution.  On the other hand, that problem *very* quickly
solves itself for even moderate object and brick counts, to the
point that almost no users have found it useful to enable striping.
Has your experience been different, or do you not know because
striping is mandatory instead of optional?

The "deep scrub and recovery" point brings up a whole different
set of memories.  We used to have a problem in GlusterFS where
self-heal would lock an entire file while it ran, so other access
to that file would be blocked for a long time.  This would cause
VMs to hang, for example.  In either 3.3 or 3.4 (can't remember)
we added "granular self-heal" which would only lock the portion
of the file that was currently under repair, in a sort of rolling
fashion.  From your comment, it sounds like RADOS still locks the
entire object.  Is that correct?  If so, I posit that it's
something we wouldn't need to solve in a prototype.  If/when that
starts turning into something real, then we'd have two options.
One is to do striping as you suggest, which means solving all of
the associated coordination problems.  Another would be to do
something like what GlusterFS did, with locking at the sub-object
level.  That does make repair less atomic, which some would
consider a consistency problem, but we do have some evidence that
it's a violation users don't seem to care about.


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux