very low file creation rate with glusterfs

wdong.pku at gmail.com (Wei Dong) · Thu, 10 Sep 2009 14:08:26 -0400

The glusterfs version I'm using is 2.0.6.

- Wei

On Thu, Sep 10, 2009 at 2:05 PM, Wei Dong <wdong.pku at gmail.com> wrote:

> Hi All,
>
> I complained about the low file creation rate with the glusterfs on my
> cluster weeks ago and Avati suggested I started with a small number of
> nodes.  I finally get sometime to seriously benchmark glusterfs with
> Bonnie++ today and the results confirms that glusterfs is indeed slow in
> terms of file creating.  My application is to store a large number of ~200KB
> image files.  I use the following bonnie++ command for evaluation (create
> 10K files of 200KiB each scattered under 100 directories):
>
> bonnie++ -d . -s 0 -n 10:200000:200000:100
>
> Since sequential I/O is not that interesting to me, I only keep the random
> I/O results.
>
> My hardware configuration is 2xquadcore Xeon E5430 2.66GHz, 16GB memory, 4
> x Seagate 1500GiB 7200RPM hard drive.  The machines are connected with
> gigabit ethernet.
>
> I ran several GlusterFS configurations, each named as N-R-T, where N is the
> number of replicated volumes aggregated, R is the number of replications and
> T is number of server side I/O thread.  I use one machine to serve one
> volume so there are NxR servers and one separate client running for each
> experiment.  On the client side, the server volumes are first replicated and
> then aggregated -- even with 1-1-2 configuration, the single volume is
> wrapped by a replicate and a distribute translator.  To show the overhead of
> those translators, I also run a "simple" configuration which is 1-1-2
> without the extra replicate & distribute translators, and a "local"
> configuration which is "simple" with client & server running on the same
> machine.  These configurations are compared to "nfs" and "nfs-local", which
> is NFS with server and client on the same machine.  The GlusterFS volume
> file templates are attached to the email.
>
> The result is at http://www.cs.princeton.edu/~wdong/gluster/summary.gif<http://www.cs.princeton.edu/%7Ewdong/gluster/summary.gif>.  The bars/numbers shown are operations/second, so the larger the better.
>
> Following are the messages shown by the figure:
> 1.  GlusterFS is doing a exceptionally good job on deleting files, but
> creates and reads files much slower than both NFS.
> 2.  At least for one node server configuration, network doesn't affects the
> file creation rate and does affects file read rate.
> 3.  The extra dummy replicate & distribute translators lowers file creation
> rate by almost half. 4.  Replication doesn't hurt performance a lot.
> 5.  I'm running only single-threaded benchmark, so it's hard to say about
> scalability, but adding more servers does helps a little bit even in
> single-threaded setting.
>
> Note that my results are not really that different from
> http://gluster.com/community/documentation/index.php/GlusterFS_2.0_I/O_Benchmark_Results,
> where the single node configuration file create rate is about 30/second.
>
>
> I see no reason why GlusterFS has to be that slower than NFS in file
> creation in single node configuration.  I'm wondering if someone here can
> help me figure out what's wrong in my configuration or what's wrong in the
> GlusterFS implementation.
>
> - Wei
>
> Server volume:
>
> volume posix
>  type storage/posix
>  option directory /state/partition1/wdong/gluster
> end-volume
>
> volume lock
>  type features/locks
>  subvolumes posix
> end-volume
>
> volume brick
>  type performance/io-threads
>  option thread-count 2
>  subvolumes lock
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp
>  option auth.addr.brick.allow 192.168.99.*
>  option transport.socket.listen-port 6999
>  subvolumes brick
> end-volume
>
>
> Client volume
>
> volume brick-0-0
>  type protocol/client
>  option transport-type tcp
>  option remote-host c8-0-0
>  option remote-port 6999
>  option remote-subvolume brick
> end-volume
>
> volume brick-0-1 ...
>
> volume rep-0
> type cluster/replicate
> subvolumes brick-0-0 brick-0-1 ...
>
> ...
> volume union
> type cluster/distribute
> subvolumes rep-0 rep-1 rep-2 rep-3 rep-4 rep-5 rep-6 rep-7
> end-volume
>
> volume client
>  type performance/write-behind
>  option cache-size 32MB
>  option flush-behind on
>  subvolumes union
> end-volume
>
>
> For those who are interested enough to see the real configuration files, I
> have all the configuration files and server/client logs uploaded to
> http://www.cs.princeton.edu/~wdong/gluster/run.tar.gz<http://www.cs.princeton.edu/%7Ewdong/gluster/run.tar.gz>.
>
>