On 03/15/2012 03:22 AM, Brian Candler wrote: > Unfortunately I don't have any experience with replicated volumes, but the > raw glusterfs protocol is very fast: a single brick which is a 12-disk raid0 > stripe can give 500MB/sec easily over 10G ethernet without any tuning. BTW, there are socket-level changes in 3.3 that should improve this quite a bit. > Striped volumes are unfortunately broken on top of XFS at the moment: > http://oss.sgi.com/archives/xfs/2012-03/msg00161.html There is work in progress to address this. To be clear, it affects sparse files in general; striping is just likely to hit this worse than anything else. Also, to address a reply further down, XFS is *reporting* the size correctly; it's just *allocating* it incorrectly IMO, in what seems to be a misguided attempt to improve some benchmark numbers. The new preallocation heuristic needs to be dialed back not only for GlusterFS, but for many other cases as well. > Replicated volumes, from what I've read, need to touch both servers even for > read operations (for the self-healing functionality), and that could be a > major bottleneck. It touches both servers during lookup (part of open/create/mkdir/etc.) but the actual reads go only to one subvolume. I even have a patch in the queue to make it distribute (not duplicate) reads across subvolumes more reliably, bringing us closer to N times single-server read performance for N-way replication.