very low file creation rate with glusterfs -- result updates

wdong.pku at gmail.com (Wei Dong) · Fri, 11 Sep 2009 09:44:42 -0400

I think it is fuse that causes the slowness.  I ran all experiments with 
booster enabled and here's the new figure:  
http://www.cs.princeton.edu/~wdong/gluster/summary-booster.gif .  The 
numbers are MUCH better than NFS in most cases except for the local 
setting, which is not practically interesting.  The interesting thing is 
that all of a sudden, the deleting rate drop by 4-10 times -- though I 
don't really care about file deletion.

I must say that I'm totally satisfied by the results.

- Wei

Wei Dong wrote:
> Hi All,
>
> I complained about the low file creation rate with the glusterfs on my 
> cluster weeks ago and Avati suggested I started with a small number of 
> nodes.  I finally get sometime to seriously benchmark glusterfs with 
> Bonnie++ today and the results confirms that glusterfs is indeed slow 
> in terms of file creating.  My application is to store a large number 
> of ~200KB image files.  I use the following bonnie++ command for 
> evaluation (create 10K files of 200KiB each scattered under 100 
> directories):
>
> bonnie++ -d . -s 0 -n 10:200000:200000:100
>
> Since sequential I/O is not that interesting to me, I only keep the 
> random I/O results.
>
> My hardware configuration is 2xquadcore Xeon E5430 2.66GHz, 16GB 
> memory, 4 x Seagate 1500GiB 7200RPM hard drive.  The machines are 
> connected with gigabit ethernet.
>
> I ran several GlusterFS configurations, each named as N-R-T, where N 
> is the number of replicated volumes aggregated, R is the number of 
> replications and T is number of server side I/O thread.  I use one 
> machine to serve one volume so there are NxR servers and one separate 
> client running for each experiment.  On the client side, the server 
> volumes are first replicated and then aggregated -- even with 1-1-2 
> configuration, the single volume is wrapped by a replicate and a 
> distribute translator.  To show the overhead of those translators, I 
> also run a "simple" configuration which is 1-1-2 without the extra 
> replicate & distribute translators, and a "local" configuration which 
> is "simple" with client & server running on the same machine.  These 
> configurations are compared to "nfs" and "nfs-local", which is NFS 
> with server and client on the same machine.  The GlusterFS volume file 
> templates are attached to the email.
>
> The result is at 
> http://www.cs.princeton.edu/~wdong/gluster/summary.gif .  The 
> bars/numbers shown are operations/second, so the larger the better.
>
> Following are the messages shown by the figure:
> 1.  GlusterFS is doing a exceptionally good job on deleting files, but 
> creates and reads files much slower than both NFS.
> 2.  At least for one node server configuration, network doesn't 
> affects the file creation rate and does affects file read rate.
> 3.  The extra dummy replicate & distribute translators lowers file 
> creation rate by almost half. 4.  Replication doesn't hurt performance 
> a lot.
> 5.  I'm running only single-threaded benchmark, so it's hard to say 
> about scalability, but adding more servers does helps a little bit 
> even in single-threaded setting.
>
> Note that my results are not really that different from 
> http://gluster.com/community/documentation/index.php/GlusterFS_2.0_I/O_Benchmark_Results, 
> where the single node configuration file create rate is about 30/second.
>
> I see no reason why GlusterFS has to be that slower than NFS in file 
> creation in single node configuration.  I'm wondering if someone here 
> can help me figure out what's wrong in my configuration or what's 
> wrong in the GlusterFS implementation.
>
> - Wei
>
> Server volume:
>
> volume posix
>  type storage/posix
>  option directory /state/partition1/wdong/gluster
> end-volume
>
> volume lock
>  type features/locks
>  subvolumes posix
> end-volume
>
> volume brick
>  type performance/io-threads
>  option thread-count 2
>  subvolumes lock
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp
>  option auth.addr.brick.allow 192.168.99.*
>  option transport.socket.listen-port 6999
>  subvolumes brick
> end-volume
>
>
> Client volume
>
> volume brick-0-0
>  type protocol/client
>  option transport-type tcp
>  option remote-host c8-0-0
>  option remote-port 6999
>  option remote-subvolume brick
> end-volume
>
> volume brick-0-1 ...
>
> volume rep-0
> type cluster/replicate
> subvolumes brick-0-0 brick-0-1 ...
>
> ...
> volume union
> type cluster/distribute
> subvolumes rep-0 rep-1 rep-2 rep-3 rep-4 rep-5 rep-6 rep-7
> end-volume
>
> volume client
>  type performance/write-behind
>  option cache-size 32MB
>  option flush-behind on
>  subvolumes union
> end-volume
>
>
> For those who are interested enough to see the real configuration 
> files, I have all the configuration files and server/client logs 
> uploaded to http://www.cs.princeton.edu/~wdong/gluster/run.tar.gz .
>