Usage Case: just not getting the performance I was hoping for

jdarcy at redhat.com (Jeff Darcy) · Thu, 15 Mar 2012 09:31:47 -0400

On 03/15/2012 03:22 AM, Brian Candler wrote:
> Unfortunately I don't have any experience with replicated volumes, but the
> raw glusterfs protocol is very fast: a single brick which is a 12-disk raid0
> stripe can give 500MB/sec easily over 10G ethernet without any tuning.

BTW, there are socket-level changes in 3.3 that should improve this quite a bit.

> Striped volumes are unfortunately broken on top of XFS at the moment:
> http://oss.sgi.com/archives/xfs/2012-03/msg00161.html

There is work in progress to address this.  To be clear, it affects sparse
files in general; striping is just likely to hit this worse than anything else.
 Also, to address a reply further down, XFS is *reporting* the size correctly;
it's just *allocating* it incorrectly IMO, in what seems to be a misguided
attempt to improve some benchmark numbers.  The new preallocation heuristic
needs to be dialed back not only for GlusterFS, but for many other cases as well.

> Replicated volumes, from what I've read, need to touch both servers even for
> read operations (for the self-healing functionality), and that could be a
> major bottleneck.

It touches both servers during lookup (part of open/create/mkdir/etc.) but the
actual reads go only to one subvolume.  I even have a patch in the queue to
make it distribute (not duplicate) reads across subvolumes more reliably,
bringing us closer to N times single-server read performance for N-way replication.