very low file creation rate with glusterfs -- result updates

wdong.pku at gmail.com (Wei Dong) · Sat, 12 Sep 2009 21:11:58 -0400

OK, so the previous good results are indeed too good to be true.  Here's 
a more reasonable evaluation 
http://www.cs.princeton.edu/~wdong/gluster/large.gif  where I enlarged 
the number of images created by 10x so everything no longer fit in main 
memory.  Still look good to me.

Wei Dong wrote:
> I think it is fuse that causes the slowness.  I ran all experiments 
> with booster enabled and here's the new figure:  
> http://www.cs.princeton.edu/~wdong/gluster/summary-booster.gif .  The 
> numbers are MUCH better than NFS in most cases except for the local 
> setting, which is not practically interesting.  The interesting thing 
> is that all of a sudden, the deleting rate drop by 4-10 times -- 
> though I don't really care about file deletion.
>
> I must say that I'm totally satisfied by the results.
>
> - Wei
>
>
> Wei Dong wrote:
>> Hi All,
>>
>> I complained about the low file creation rate with the glusterfs on 
>> my cluster weeks ago and Avati suggested I started with a small 
>> number of nodes.  I finally get sometime to seriously benchmark 
>> glusterfs with Bonnie++ today and the results confirms that glusterfs 
>> is indeed slow in terms of file creating.  My application is to store 
>> a large number of ~200KB image files.  I use the following bonnie++ 
>> command for evaluation (create 10K files of 200KiB each scattered 
>> under 100 directories):
>>
>> bonnie++ -d . -s 0 -n 10:200000:200000:100
>>
>> Since sequential I/O is not that interesting to me, I only keep the 
>> random I/O results.
>>
>> My hardware configuration is 2xquadcore Xeon E5430 2.66GHz, 16GB 
>> memory, 4 x Seagate 1500GiB 7200RPM hard drive.  The machines are 
>> connected with gigabit ethernet.
>>
>> I ran several GlusterFS configurations, each named as N-R-T, where N 
>> is the number of replicated volumes aggregated, R is the number of 
>> replications and T is number of server side I/O thread.  I use one 
>> machine to serve one volume so there are NxR servers and one separate 
>> client running for each experiment.  On the client side, the server 
>> volumes are first replicated and then aggregated -- even with 1-1-2 
>> configuration, the single volume is wrapped by a replicate and a 
>> distribute translator.  To show the overhead of those translators, I 
>> also run a "simple" configuration which is 1-1-2 without the extra 
>> replicate & distribute translators, and a "local" configuration which 
>> is "simple" with client & server running on the same machine.  These 
>> configurations are compared to "nfs" and "nfs-local", which is NFS 
>> with server and client on the same machine.  The GlusterFS volume 
>> file templates are attached to the email.
>>
>> The result is at 
>> http://www.cs.princeton.edu/~wdong/gluster/summary.gif .  The 
>> bars/numbers shown are operations/second, so the larger the better.
>>
>> Following are the messages shown by the figure:
>> 1.  GlusterFS is doing a exceptionally good job on deleting files, 
>> but creates and reads files much slower than both NFS.
>> 2.  At least for one node server configuration, network doesn't 
>> affects the file creation rate and does affects file read rate.
>> 3.  The extra dummy replicate & distribute translators lowers file 
>> creation rate by almost half. 4.  Replication doesn't hurt 
>> performance a lot.
>> 5.  I'm running only single-threaded benchmark, so it's hard to say 
>> about scalability, but adding more servers does helps a little bit 
>> even in single-threaded setting.
>>
>> Note that my results are not really that different from 
>> http://gluster.com/community/documentation/index.php/GlusterFS_2.0_I/O_Benchmark_Results, 
>> where the single node configuration file create rate is about 30/second.
>>
>> I see no reason why GlusterFS has to be that slower than NFS in file 
>> creation in single node configuration.  I'm wondering if someone here 
>> can help me figure out what's wrong in my configuration or what's 
>> wrong in the GlusterFS implementation.
>>
>> - Wei
>>
>> Server volume:
>>
>> volume posix
>>  type storage/posix
>>  option directory /state/partition1/wdong/gluster
>> end-volume
>>
>> volume lock
>>  type features/locks
>>  subvolumes posix
>> end-volume
>>
>> volume brick
>>  type performance/io-threads
>>  option thread-count 2
>>  subvolumes lock
>> end-volume
>>
>> volume server
>>  type protocol/server
>>  option transport-type tcp
>>  option auth.addr.brick.allow 192.168.99.*
>>  option transport.socket.listen-port 6999
>>  subvolumes brick
>> end-volume
>>
>>
>> Client volume
>>
>> volume brick-0-0
>>  type protocol/client
>>  option transport-type tcp
>>  option remote-host c8-0-0
>>  option remote-port 6999
>>  option remote-subvolume brick
>> end-volume
>>
>> volume brick-0-1 ...
>>
>> volume rep-0
>> type cluster/replicate
>> subvolumes brick-0-0 brick-0-1 ...
>>
>> ...
>> volume union
>> type cluster/distribute
>> subvolumes rep-0 rep-1 rep-2 rep-3 rep-4 rep-5 rep-6 rep-7
>> end-volume
>>
>> volume client
>>  type performance/write-behind
>>  option cache-size 32MB
>>  option flush-behind on
>>  subvolumes union
>> end-volume
>>
>>
>> For those who are interested enough to see the real configuration 
>> files, I have all the configuration files and server/client logs 
>> uploaded to http://www.cs.princeton.edu/~wdong/gluster/run.tar.gz .
>>
>
>