Usage Case: just not getting the performance I was hoping for

B.Candler at pobox.com (Brian Candler) · Thu, 15 Mar 2012 07:22:52 +0000

On Wed, Mar 14, 2012 at 11:09:28PM -0500, D. Dante Lorenso wrote:
> get 50-60 MB/s transfer speeds tops when sending large files (> 2GB)
> to gluster. When copying a directory of small files, we get <= 1
> MB/s performance!
> 
> My question is ... is this right?  Is this what I should expect from
> Gluster, or is there something we did wrong?  We aren't using super
> expensive equipment, granted, but I was really hoping for better
> performance than this given that raw drive speeds using dd show that
> we can write at 125+ MB/s to each "brick" 2TB disk.

Unfortunately I don't have any experience with replicated volumes, but the
raw glusterfs protocol is very fast: a single brick which is a 12-disk raid0
stripe can give 500MB/sec easily over 10G ethernet without any tuning.

I would expect a distributed volume to work fine too, as it just sends each
request to one of N nodes.

Striped volumes are unfortunately broken on top of XFS at the moment:
http://oss.sgi.com/archives/xfs/2012-03/msg00161.html

Replicated volumes, from what I've read, need to touch both servers even for
read operations (for the self-healing functionality), and that could be a
major bottleneck.

But there are a few basic things to check:

(1) Are you using XFS for the underlying filesystems? If so, did you mount
them with the "inode64" mount option?  Without this, XFS performance sucks
really badly for filesystems >1TB

Without inode64, even untarring files into a single directory will make XFS
distribute them between AGs, rather than allocating contiguous space for
them.

This is a major trip-up and there is currently talk of changing the default
to be inode64.

(2) I have this in /etc/rc.local:

for i in /sys/block/sd*/bdi/read_ahead_kb; do echo 1024 >"$i"; done
for i in /sys/block/sd*/queue/max_sectors_kb; do echo 1024 >"$i"; done

> If I can't get gluster to work, our fail-back plan is to convert
> these 8 servers into iSCSI targets and mount the storage onto a
> Win2008 head and continue sharing to the network as before.
> Personally, I would rather us continue moving toward CentOS 6.2 with
> Samba and Gluster, but I can't justify the change unless I can
> deliver the performance.

Optimising replicated volumes I can't help with.

However if you make a simple RAID10 array on each server, and then join the
servers into a distributed gluster volume, I think it will rock.  What you
lose is the high-availability, i.e.  if one server fails a proportion of
your data becomes unavailable until you fix it - but that's no worse than
your iSCSI proposal (unless you are doing something complex, like drbd
replication between pairs of nodes and HA failover of the iSCSI target)

BTW, Linux md RAID10 with 'far' layout is really cool; for reads it performs
like a RAID0 stripe, and it reduces head seeking for random access.

Regards,

Brian.