I think it is fuse that causes the slowness. I ran all experiments with booster enabled and here's the new figure: http://www.cs.princeton.edu/~wdong/gluster/summary-booster.gif . The numbers are MUCH better than NFS in most cases except for the local setting, which is not practically interesting. The interesting thing is that all of a sudden, the deleting rate drop by 4-10 times -- though I don't really care about file deletion. I must say that I'm totally satisfied by the results. - Wei Wei Dong wrote: > Hi All, > > I complained about the low file creation rate with the glusterfs on my > cluster weeks ago and Avati suggested I started with a small number of > nodes. I finally get sometime to seriously benchmark glusterfs with > Bonnie++ today and the results confirms that glusterfs is indeed slow > in terms of file creating. My application is to store a large number > of ~200KB image files. I use the following bonnie++ command for > evaluation (create 10K files of 200KiB each scattered under 100 > directories): > > bonnie++ -d . -s 0 -n 10:200000:200000:100 > > Since sequential I/O is not that interesting to me, I only keep the > random I/O results. > > My hardware configuration is 2xquadcore Xeon E5430 2.66GHz, 16GB > memory, 4 x Seagate 1500GiB 7200RPM hard drive. The machines are > connected with gigabit ethernet. > > I ran several GlusterFS configurations, each named as N-R-T, where N > is the number of replicated volumes aggregated, R is the number of > replications and T is number of server side I/O thread. I use one > machine to serve one volume so there are NxR servers and one separate > client running for each experiment. On the client side, the server > volumes are first replicated and then aggregated -- even with 1-1-2 > configuration, the single volume is wrapped by a replicate and a > distribute translator. To show the overhead of those translators, I > also run a "simple" configuration which is 1-1-2 without the extra > replicate & distribute translators, and a "local" configuration which > is "simple" with client & server running on the same machine. These > configurations are compared to "nfs" and "nfs-local", which is NFS > with server and client on the same machine. The GlusterFS volume file > templates are attached to the email. > > The result is at > http://www.cs.princeton.edu/~wdong/gluster/summary.gif . The > bars/numbers shown are operations/second, so the larger the better. > > Following are the messages shown by the figure: > 1. GlusterFS is doing a exceptionally good job on deleting files, but > creates and reads files much slower than both NFS. > 2. At least for one node server configuration, network doesn't > affects the file creation rate and does affects file read rate. > 3. The extra dummy replicate & distribute translators lowers file > creation rate by almost half. 4. Replication doesn't hurt performance > a lot. > 5. I'm running only single-threaded benchmark, so it's hard to say > about scalability, but adding more servers does helps a little bit > even in single-threaded setting. > > Note that my results are not really that different from > http://gluster.com/community/documentation/index.php/GlusterFS_2.0_I/O_Benchmark_Results, > where the single node configuration file create rate is about 30/second. > > I see no reason why GlusterFS has to be that slower than NFS in file > creation in single node configuration. I'm wondering if someone here > can help me figure out what's wrong in my configuration or what's > wrong in the GlusterFS implementation. > > - Wei > > Server volume: > > volume posix > type storage/posix > option directory /state/partition1/wdong/gluster > end-volume > > volume lock > type features/locks > subvolumes posix > end-volume > > volume brick > type performance/io-threads > option thread-count 2 > subvolumes lock > end-volume > > volume server > type protocol/server > option transport-type tcp > option auth.addr.brick.allow 192.168.99.* > option transport.socket.listen-port 6999 > subvolumes brick > end-volume > > > Client volume > > volume brick-0-0 > type protocol/client > option transport-type tcp > option remote-host c8-0-0 > option remote-port 6999 > option remote-subvolume brick > end-volume > > volume brick-0-1 ... > > volume rep-0 > type cluster/replicate > subvolumes brick-0-0 brick-0-1 ... > > ... > volume union > type cluster/distribute > subvolumes rep-0 rep-1 rep-2 rep-3 rep-4 rep-5 rep-6 rep-7 > end-volume > > volume client > type performance/write-behind > option cache-size 32MB > option flush-behind on > subvolumes union > end-volume > > > For those who are interested enough to see the real configuration > files, I have all the configuration files and server/client logs > uploaded to http://www.cs.princeton.edu/~wdong/gluster/run.tar.gz . >