GlusterFS3.1 - Bad Write Performance Unzipping Many Small Files

rafiq.maniar at gmail.com (Rafiq Maniar) · Fri, 19 Nov 2010 06:32:02 +0000

Hi Craig,

Thanks for your reply.

I have confirmed that it is not disk IO-bound by creating a second volume
sharing /tmp and that has the same issue. Also, I am in fact using multiple
EBS volumes in an LVM array (although not 10).

I tried the latest 3.1.1 build on a new set of instances and that has the
same performance.

If I take one server offline the time reduces to about 23 seconds (from
40-50). Replication doubles the
time taken, which makes sense - however without replication 23 seconds is
still far too long for a file that takes less than
one second on local disk.

I stopped Gluster and created a normal NFS export of the directory, and the
unzip completed in 4.6 seconds which is around
the performance level I need to get.

Any ideas?

Thanks,
Rafiq

On Fri, Nov 19, 2010 at 5:30 AM, Craig Carl <craig at gluster.com> wrote:

> Rafiq -
>   Gluster 3.1.1 will ship shortly, in our testing performance has been
> significantly improved. If you are not in production the QA builds of
> Gluster 3.1.1 are here -
> http://download.gluster.com/pub/gluster/glusterfs/qa-releases/. The QA
> releases should NOT be run in production.
>   We have also found that the performance of a single EBS device can be a
> limiting factor. I have found that creating lots of smaller EBS devices then
> using mdadm to build a RAID 0 array can improve performance without raising
> costs. I have tested with 10 EBS devices, performance increased 4-5x for
> disk bound applications.
>
>
> Thanks,
>
> Craig
>
> -->
> Craig Carl
> Senior Systems Engineer
> Gluster
>
> On 11/18/2010 09:14 PM, Rafiq Maniar wrote:
>
>> Hi,
>>
>> I'm using Glusterfs3.1 on Ubuntu 10.04 in a dual replication setup, on
>> Amazon EC2.
>>
>> It takes 40-50 seconds to unzip an 8MB zip file full of small files and
>> directories to a gluster
>> mount, in contrast to 0.8 seconds to local disk.
>>
>> The volume configuration was created with:
>> gluster volume create volname replica 1 transport tcp server1:/shared
>> server2:/shared
>>
>> I am mounting on the client via NFS with:
>> mount -t nfs -o async,noatime,nodiratime server1:/shared /mnt/shared
>>
>> And also tried via Gluster native client with:
>> mount -t glusterfs server1:/shared /mnt/shared
>>
>> I found a post here where he author talks about a similar slow unzip of
>> the
>> Linux kernel:
>> http://northernmost.org/blog/improving-glusterfs-performance/
>>
>> I believe the 'nodelay' option was implemented in response to this, and I
>> have tried using that
>> in the 3 configuration files on the servers but with no improvement. I've
>> also tried some other performance
>> tuning tricks I found on the web.
>>
>> I tried it on another server that has Gluster3.0 with an NFS share but no
>> replication and it completes in 3 seconds.
>>
>> I have similar bad performance with a simple copy of the same
>> files+directories from /tmp into the gluster mount so
>> its not limited to zip.
>>
>> Here is my /etc/glusterd/vols/shared/shared-fuse.vol.
>> Bear in mind that this is 'tuned' but the out-of-the-box version is the
>> same
>> performance.
>> I also tried removing all the performance translators as per someones
>> suggestion in IRC.
>>
>> *
>> *
>> *volume shared-client-0*
>> *    type protocol/client*
>> *    option remote-host server1*
>> *    option remote-subvolume /mnt/shared*
>> *    option transport-type tcp*
>> *    option transport.socket.nodelay on*
>> *end-volume*
>> *
>> *
>> *volume shared-client-1*
>> *    type protocol/client*
>> *    option remote-host server2*
>> *    option remote-subvolume /mnt/shared*
>> *    option transport-type tcp*
>> *    option transport.socket.nodelay on*
>> *end-volume*
>> *
>> *
>> *volume shared-replicate-0*
>> *    type cluster/replicate*
>> *    subvolumes shared-client-0 shared-client-1*
>> *end-volume*
>> *
>> *
>> *volume shared-write-behind*
>> *    type performance/write-behind*
>> *    option cache-size 100MB*
>> *    option flush-behind off*
>> *    subvolumes shared-replicate-0*
>> *end-volume*
>> *
>> *
>> *volume shared-read-ahead*
>> *    type performance/read-ahead*
>> *    subvolumes shared-write-behind*
>> *end-volume*
>> *
>> *
>> *volume shared-io-cache*
>> *    type performance/io-cache*
>> *        option cache-size 100MB*
>> *        option cache-timeout 1*
>> *    subvolumes shared-read-ahead*
>> *end-volume*
>> *
>> *
>> *volume shared-quick-read*
>> *    type performance/quick-read*
>> * option cache-timeout 1         # default 1 second*
>> *  option max-file-size 256KB        # default 64Kb*
>> *    subvolumes shared-io-cache*
>> *end-volume*
>> *
>> *
>> *
>> *
>> *volume shared*
>> *    type debug/io-stats*
>> *    subvolumes shared-quick-read*
>> *end-volume*
>> *
>> *
>>
>> And my /etc/glusterd/vols/shared/shared.server1.mnt-shared.vol :
>>
>> *volume shared-posix*
>> *    type storage/posix*
>> *    option directory /mnt/shared*
>> *end-volume*
>> *
>> *
>> *volume shared-access-control*
>> *    type features/access-control*
>> *    subvolumes shared-posix*
>> *end-volume*
>> *
>> *
>> *volume shared-locks*
>> *    type features/locks*
>> *    subvolumes shared-access-control*
>> *end-volume*
>> *
>> *
>> *volume shared-io-threads*
>> *    type performance/io-threads*
>> *    option thread-count 16*
>> *    subvolumes shared-locks*
>> *end-volume*
>> *
>> *
>> *volume /mnt/shared*
>> *    type debug/io-stats*
>> *    subvolumes shared-io-threads*
>> *end-volume*
>> *
>> *
>> *volume shared-server*
>> *    type protocol/server*
>> *    option transport-type tcp*
>> *    option auth.addr./mnt/shared.allow **
>> *    option transport.socket.nodelay on*
>> *    subvolumes /mnt/shared*
>> *end-volume*
>>
>>
>> Here's the output of nfsstat on the client:
>> *Client rpc stats:*
>> *calls      retrans    authrefrsh*
>> *1652499    231        124*
>> *
>> *
>> *Client nfs v3:*
>> *null         getattr      setattr      lookup       access
>> readlink*
>> *0         0% 744498   45% 32762     1% 490843   29% 235276   14% 37
>>  0%*
>> *read         write        create       mkdir        symlink      mknod*
>> *52085     3% 21940     1% 14452     0% 948       0% 1         0% 0
>> 0%*
>> *remove       rmdir        rename       link         readdir
>>  readdirplus*
>> *10961     0% 562       0% 19        0% 0         0% 135       0% 32623
>> 1%*
>> *fsstat       fsinfo       pathconf     commit*
>> *140       0% 46        0% 23        0% 15126     0%*
>>
>>
>> Anyone got any ideas on improving the performance of this?
>>
>> Thanks,
>> Rafiq
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>