GlusterFS3.1 - Bad Write Performance Unzipping Many Small Files

craig at gluster.com (Craig Carl) · Fri, 19 Nov 2010 17:00:00 -0800



Rafiq -
    I have added an unzip of the kernel to the AWS testing I am running 
this weekend, I'll get back to the list on Tuesday, hopefully I'll have 
a better handle around AWS performance then.

Craig


On 11/18/2010 10:32 PM, Rafiq Maniar wrote:
> Hi Craig,
>
> Thanks for your reply.
>
> I have confirmed that it is not disk IO-bound by creating a second 
> volume sharing /tmp and that has the same issue. Also, I am in fact 
> using multiple
> EBS volumes in an LVM array (although not 10).
>
> I tried the latest 3.1.1 build on a new set of instances and that has 
> the same performance.
>
> If I take one server offline the time reduces to about 23 seconds 
> (from 40-50). Replication doubles the
> time taken, which makes sense - however without replication 23 seconds 
> is still far too long for a file that takes less than
> one second on local disk.
>
> I stopped Gluster and created a normal NFS export of the directory, 
> and the unzip completed in 4.6 seconds which is around
> the performance level I need to get.
>
> Any ideas?
>
> Thanks,
> Rafiq
>
>
> On Fri, Nov 19, 2010 at 5:30 AM, Craig Carl <craig at gluster.com 
> <mailto:craig at gluster.com>> wrote:
>
>     Rafiq -
>       Gluster 3.1.1 will ship shortly, in our testing performance has
>     been significantly improved. If you are not in production the QA
>     builds of Gluster 3.1.1 are here -
>     http://download.gluster.com/pub/gluster/glusterfs/qa-releases/.
>     The QA releases should NOT be run in production.
>       We have also found that the performance of a single EBS device
>     can be a limiting factor. I have found that creating lots of
>     smaller EBS devices then using mdadm to build a RAID 0 array can
>     improve performance without raising costs. I have tested with 10
>     EBS devices, performance increased 4-5x for disk bound applications.
>
>
>     Thanks,
>
>     Craig
>
>     -->
>     Craig Carl
>     Senior Systems Engineer
>     Gluster
>
>     On 11/18/2010 09:14 PM, Rafiq Maniar wrote:
>
>         Hi,
>
>         I'm using Glusterfs3.1 on Ubuntu 10.04 in a dual replication
>         setup, on
>         Amazon EC2.
>
>         It takes 40-50 seconds to unzip an 8MB zip file full of small
>         files and
>         directories to a gluster
>         mount, in contrast to 0.8 seconds to local disk.
>
>         The volume configuration was created with:
>         gluster volume create volname replica 1 transport tcp
>         server1:/shared
>         server2:/shared
>
>         I am mounting on the client via NFS with:
>         mount -t nfs -o async,noatime,nodiratime server1:/shared
>         /mnt/shared
>
>         And also tried via Gluster native client with:
>         mount -t glusterfs server1:/shared /mnt/shared
>
>         I found a post here where he author talks about a similar slow
>         unzip of the
>         Linux kernel:
>         http://northernmost.org/blog/improving-glusterfs-performance/
>
>         I believe the 'nodelay' option was implemented in response to
>         this, and I
>         have tried using that
>         in the 3 configuration files on the servers but with no
>         improvement. I've
>         also tried some other performance
>         tuning tricks I found on the web.
>
>         I tried it on another server that has Gluster3.0 with an NFS
>         share but no
>         replication and it completes in 3 seconds.
>
>         I have similar bad performance with a simple copy of the same
>         files+directories from /tmp into the gluster mount so
>         its not limited to zip.
>
>         Here is my /etc/glusterd/vols/shared/shared-fuse.vol.
>         Bear in mind that this is 'tuned' but the out-of-the-box
>         version is the same
>         performance.
>         I also tried removing all the performance translators as per
>         someones
>         suggestion in IRC.
>
>         *
>         *
>         *volume shared-client-0*
>         *    type protocol/client*
>         *    option remote-host server1*
>         *    option remote-subvolume /mnt/shared*
>         *    option transport-type tcp*
>         *    option transport.socket.nodelay on*
>         *end-volume*
>         *
>         *
>         *volume shared-client-1*
>         *    type protocol/client*
>         *    option remote-host server2*
>         *    option remote-subvolume /mnt/shared*
>         *    option transport-type tcp*
>         *    option transport.socket.nodelay on*
>         *end-volume*
>         *
>         *
>         *volume shared-replicate-0*
>         *    type cluster/replicate*
>         *    subvolumes shared-client-0 shared-client-1*
>         *end-volume*
>         *
>         *
>         *volume shared-write-behind*
>         *    type performance/write-behind*
>         *    option cache-size 100MB*
>         *    option flush-behind off*
>         *    subvolumes shared-replicate-0*
>         *end-volume*
>         *
>         *
>         *volume shared-read-ahead*
>         *    type performance/read-ahead*
>         *    subvolumes shared-write-behind*
>         *end-volume*
>         *
>         *
>         *volume shared-io-cache*
>         *    type performance/io-cache*
>         *        option cache-size 100MB*
>         *        option cache-timeout 1*
>         *    subvolumes shared-read-ahead*
>         *end-volume*
>         *
>         *
>         *volume shared-quick-read*
>         *    type performance/quick-read*
>         * option cache-timeout 1         # default 1 second*
>         *  option max-file-size 256KB        # default 64Kb*
>         *    subvolumes shared-io-cache*
>         *end-volume*
>         *
>         *
>         *
>         *
>         *volume shared*
>         *    type debug/io-stats*
>         *    subvolumes shared-quick-read*
>         *end-volume*
>         *
>         *
>
>         And my /etc/glusterd/vols/shared/shared.server1.mnt-shared.vol :
>
>         *volume shared-posix*
>         *    type storage/posix*
>         *    option directory /mnt/shared*
>         *end-volume*
>         *
>         *
>         *volume shared-access-control*
>         *    type features/access-control*
>         *    subvolumes shared-posix*
>         *end-volume*
>         *
>         *
>         *volume shared-locks*
>         *    type features/locks*
>         *    subvolumes shared-access-control*
>         *end-volume*
>         *
>         *
>         *volume shared-io-threads*
>         *    type performance/io-threads*
>         *    option thread-count 16*
>         *    subvolumes shared-locks*
>         *end-volume*
>         *
>         *
>         *volume /mnt/shared*
>         *    type debug/io-stats*
>         *    subvolumes shared-io-threads*
>         *end-volume*
>         *
>         *
>         *volume shared-server*
>         *    type protocol/server*
>         *    option transport-type tcp*
>         *    option auth.addr./mnt/shared.allow **
>         *    option transport.socket.nodelay on*
>         *    subvolumes /mnt/shared*
>         *end-volume*
>
>
>         Here's the output of nfsstat on the client:
>         *Client rpc stats:*
>         *calls      retrans    authrefrsh*
>         *1652499    231        124*
>         *
>         *
>         *Client nfs v3:*
>         *null         getattr      setattr      lookup       access  
>             readlink*
>         *0         0% 744498   45% 32762     1% 490843   29% 235276  
>         14% 37
>          0%*
>         *read         write        create       mkdir        symlink  
>            mknod*
>         *52085     3% 21940     1% 14452     0% 948       0% 1        
>         0% 0
>         0%*
>         *remove       rmdir        rename       link         readdir
>          readdirplus*
>         *10961     0% 562       0% 19        0% 0         0% 135      
>         0% 32623
>         1%*
>         *fsstat       fsinfo       pathconf     commit*
>         *140       0% 46        0% 23        0% 15126     0%*
>
>
>         Anyone got any ideas on improving the performance of this?
>
>         Thanks,
>         Rafiq
>
>
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>         http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>