GlusterFS3.1 - Bad Write Performance Unzipping Many Small Files

jacob at gluster.com (Jacob Shucart) · Wed, 24 Nov 2010 10:53:39 -0600 (CST)

Rafiq,

We have identified a bug that will be fixed in 3.1.1 which should be out
very soon that should help with this.

-Jacob

-----Original Message-----
From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Rafiq Maniar
Sent: Thursday, November 18, 2010 1:39 PM
To: gluster-users at gluster.org
Subject: GlusterFS3.1 - Bad Write Performance Unzipping
Many Small Files

Hi,

I'm using Glusterfs3.1 on Ubuntu 10.04 in a dual replication setup, on
Amazon EC2.

It takes 40-50 seconds to unzip an 8MB zip file full of small files and
directories to a gluster mount, in contrast to 0.8 seconds to local disk.

The volume configuration was created with:
gluster volume create volname replica 1 transport tcp server1:/shared
server2:/shared

I am mounting on the client via NFS with:
mount -t nfs -o async,noatime,nodiratime server1:/shared /mnt/shared

And also tried via Gluster native client with:
mount -t glusterfs server1:/shared /mnt/shared

I found a post here where he author talks about a similar slow unzip of
the Linux kernel:
http://northernmost.org/blog/improving-glusterfs-performance/

I believe the 'nodelay' option was implemented in response to this, and I
have tried using that in the 3 configuration files on the servers but with
no improvement. I've also tried some other performance tuning tricks I
found on the web.

I tried it on another server that has Gluster3.0 with an NFS share but no
replication and it completes in 3 seconds.

I have similar bad performance with a simple copy of the same
files+directories from /tmp into the gluster mount so
its not limited to zip.

Here is my /etc/glusterd/vols/shared/shared-fuse.vol.
Bear in mind that this is 'tuned' but the out-of-the-box version is the
same performance.
I also tried removing all the performance translators as per someones
suggestion in IRC.

*
*
*volume shared-client-0*
*    type protocol/client*
*    option remote-host server1*
*    option remote-subvolume /mnt/shared*
*    option transport-type tcp*
*    option transport.socket.nodelay on*
*end-volume*
*
*
*volume shared-client-1*
*    type protocol/client*
*    option remote-host server2*
*    option remote-subvolume /mnt/shared*
*    option transport-type tcp*
*    option transport.socket.nodelay on*
*end-volume*
*
*
*volume shared-replicate-0*
*    type cluster/replicate*
*    subvolumes shared-client-0 shared-client-1*
*end-volume*
*
*
*volume shared-write-behind*
*    type performance/write-behind*
*    option cache-size 100MB*
*    option flush-behind off*
*    subvolumes shared-replicate-0*
*end-volume*
*
*
*volume shared-read-ahead*
*    type performance/read-ahead*
*    subvolumes shared-write-behind*
*end-volume*
*
*
*volume shared-io-cache*
*    type performance/io-cache*
*        option cache-size 100MB*
*        option cache-timeout 1*
*    subvolumes shared-read-ahead*
*end-volume*
*
*
*volume shared-quick-read*
*    type performance/quick-read*
* option cache-timeout 1         # default 1 second*
*  option max-file-size 256KB        # default 64Kb*
*    subvolumes shared-io-cache*
*end-volume*
*
*
*
*
*volume shared*
*    type debug/io-stats*
*    subvolumes shared-quick-read*
*end-volume*
*
*

And my /etc/glusterd/vols/shared/shared.server1.mnt-shared.vol :

*volume shared-posix*
*    type storage/posix*
*    option directory /mnt/shared*
*end-volume*
*
*
*volume shared-access-control*
*    type features/access-control*
*    subvolumes shared-posix*
*end-volume*
*
*
*volume shared-locks*
*    type features/locks*
*    subvolumes shared-access-control*
*end-volume*
*
*
*volume shared-io-threads*
*    type performance/io-threads*
*    option thread-count 16*
*    subvolumes shared-locks*
*end-volume*
*
*
*volume /mnt/shared*
*    type debug/io-stats*
*    subvolumes shared-io-threads*
*end-volume*
*
*
*volume shared-server*
*    type protocol/server*
*    option transport-type tcp*
*    option auth.addr./mnt/shared.allow **
*    option transport.socket.nodelay on*
*    subvolumes /mnt/shared*
*end-volume*

Here's the output of nfsstat on the client:
*Client rpc stats:*
*calls      retrans    authrefrsh*
*1652499    231        124*
*
*
*Client nfs v3:*
*null         getattr      setattr      lookup       access
readlink*
*0         0% 744498   45% 32762     1% 490843   29% 235276   14% 37
 0%*
*read         write        create       mkdir        symlink      mknod*
*52085     3% 21940     1% 14452     0% 948       0% 1         0% 0
0%*
*remove       rmdir        rename       link         readdir
 readdirplus*
*10961     0% 562       0% 19        0% 0         0% 135       0% 32623
1%*
*fsstat       fsinfo       pathconf     commit*
*140       0% 46        0% 23        0% 15126     0%*

Anyone got any ideas on improving the performance of this?

Thanks,
Rafiq