Re: performance improvements

Kevan Benson <kbenson@xxxxxxxxxxxxxxx> · Tue, 23 Oct 2007 10:02:42 -0700

Vincent Régnard wrote:
Hi all,

We are presently trying to tune our non-gluster configuration to 
improve glusterfs performance. My config is gluster 
1.3.7/fuse2.7.0-glfs5, linux 2.6.16.55. We have 3 clients and 3 
servers on a 100Mb network with 5ms round trip between clients and 
servers. The 3 clients replicate with afr on client side over the 3 
servers.

We have a read/write throughput benchmark (dbench) between 2 and 5 MB/s.

I imagine your clients and servers are the same systems?  Otherwise, 
5MB/s shouldn't be possible on a 100 Mbit network.  If one of the three 
AFR locations to write to is local, that means you have 100Mbit to write 
the 2 other copies, or about 11.11 MB/sec total at line saturation (what 
I usually see at least).  Since it's two copies, that's about 5.5MB/s 
max.  If all three AFR subvolumes are remote, that's 11.11 MB/s split 3 
ways.

The afr synchronisation using "find -mtime -1 -type f -exec head -c1 
trick" takes approximately 30 minutes for a 20GB filesystem with 
300.000 files. Which seems too long to be acceptable for us. I'd like 
to tune some parameters to increase performance.

20 minutes when the other AFR's don't have any data and it all needs to 
be synced, or 20 minutes when they are all already in sync?  This time 
is going to be highly dependent on how many files you have, not just the 
size (as the command will take about 1 seconds or less probably on 20 
1GB files that are already in sync on all servers).

I can imagine that reducing the roundtrip between servers might help ? 
But I cannot actually do anything for that. The only thing I might be 
able to do is to configure some QOS. Have you any suggestion about how 
we should do that ? Would giving priority to tcp/6996 between clients 
and servers really help ?

Separate network connections to each AFR subvoume.  VLAN your switches 
and implement separate logical networks for each connection to the AFR 
subvolumes using secondary (or even tertiary) nics in each client.  You 
can effectively double or triple your throughput while increasing 
redundancy.

At the (linux) kernel level, could acting on PREMPTION MODEL and 
CONFIG_HZ produce improvement ?

Our present config is as follow:

# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y

# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250

Is it better to prefer SMP to non-SMP kernel builds ? (We presently 
have SMP eneabled for our dual-cores). What impact on glusterfs 
performances if we deactivate SMP ?

We use linuxthread (glibc2.3) and have no NPTL support, can this 
influence the performances as well ?

We naturally already have gluster improvements in the configuration 
(io-{thread,cache}, readahead and writebehind).

Thanks in advance for your comments or suggestions.

Vincent.

I think your problem is more architecture limitations than kernel 
scheduling.  There's a cost for redundancy, and it's performance.  It's 
just much easier to scale the performance with glusterfs with more hardware.

--

-Kevan Benson
-A-1 Networks