Is there a FAQ/document somewhere with optimal mkfs and mount options for ext4 and xfs? Is xfs still the 'desired' filesystem for gluster bricks? On 3/15/12 3:22 AM, Brian Candler wrote: > On Wed, Mar 14, 2012 at 11:09:28PM -0500, D. Dante Lorenso wrote: >> get 50-60 MB/s transfer speeds tops when sending large files (> 2GB) >> to gluster. When copying a directory of small files, we get<= 1 >> MB/s performance! >> >> My question is ... is this right? Is this what I should expect from >> Gluster, or is there something we did wrong? We aren't using super >> expensive equipment, granted, but I was really hoping for better >> performance than this given that raw drive speeds using dd show that >> we can write at 125+ MB/s to each "brick" 2TB disk. > Unfortunately I don't have any experience with replicated volumes, but the > raw glusterfs protocol is very fast: a single brick which is a 12-disk raid0 > stripe can give 500MB/sec easily over 10G ethernet without any tuning. > > I would expect a distributed volume to work fine too, as it just sends each > request to one of N nodes. > > Striped volumes are unfortunately broken on top of XFS at the moment: > http://oss.sgi.com/archives/xfs/2012-03/msg00161.html > > Replicated volumes, from what I've read, need to touch both servers even for > read operations (for the self-healing functionality), and that could be a > major bottleneck. > > But there are a few basic things to check: > > (1) Are you using XFS for the underlying filesystems? If so, did you mount > them with the "inode64" mount option? Without this, XFS performance sucks > really badly for filesystems>1TB > > Without inode64, even untarring files into a single directory will make XFS > distribute them between AGs, rather than allocating contiguous space for > them. > > This is a major trip-up and there is currently talk of changing the default > to be inode64. > > (2) I have this in /etc/rc.local: > > for i in /sys/block/sd*/bdi/read_ahead_kb; do echo 1024>"$i"; done > for i in /sys/block/sd*/queue/max_sectors_kb; do echo 1024>"$i"; done > >> If I can't get gluster to work, our fail-back plan is to convert >> these 8 servers into iSCSI targets and mount the storage onto a >> Win2008 head and continue sharing to the network as before. >> Personally, I would rather us continue moving toward CentOS 6.2 with >> Samba and Gluster, but I can't justify the change unless I can >> deliver the performance. > Optimising replicated volumes I can't help with. > > However if you make a simple RAID10 array on each server, and then join the > servers into a distributed gluster volume, I think it will rock. What you > lose is the high-availability, i.e. if one server fails a proportion of > your data becomes unavailable until you fix it - but that's no worse than > your iSCSI proposal (unless you are doing something complex, like drbd > replication between pairs of nodes and HA failover of the iSCSI target) > > BTW, Linux md RAID10 with 'far' layout is really cool; for reads it performs > like a RAID0 stripe, and it reduces head seeking for random access. > > Regards, > > Brian. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users