Thank you Julien, that's very useful for comparison. I should have tried
that myself first. So, it would appear that with AFR enabled, the
transaction is not complete until the entire request (replicating the
file to the mirror site) is completed. This makes sense from a
consistency perspective; however, has there been any talk of adding an
option of "lazy" AFR, where writes to the mirror site would not block
the transaction? We are not overly concerned with this level of
consistency and can periodically rsync the data, or when the f*
utilities some out, run some sort of repair.
Best,
Erik Osterman
Julien Perez wrote:
Hello,
By looking at your configuration, my first guess is that you have
latency issues in your network, which would definitely explain that
awful performances. So using your configuration files, I've set up the
same architecture, but locally (one computer running on Sempron 2600+
with 2 hitachi SATA1 disks in RAID1 software) and here are the results
I got:
toad@web1:~/afr$ time for i in {1..100}; do touch ./mnt/test.$i; done
real 0m0.560s
user 0m0.160s
sys 0m0.190s
toad@web1:~/afr$ rm -f ./mnt/*
toad@web1:~/afr$ time for i in {1..10000}; do touch ./mnt/test.$i; done
real 1m8.060s
user 0m16.680s
sys 0m18.180s
toad@web1:~/afr$ find ./mnt/ -type f | xargs -n100 rm -f
toad@web1:~/afr$ time for i in {1..1000}; do touch ./mnt/test.$i; done
real 0m5.829s
user 0m1.670s
sys 0m1.910s
So my advise would be: check your network :)
Hope it helped,
Have a nice day everyone
Julien Perez
On 3/12/07, *Erik Osterman* <e@xxxxxxxxxxxx <mailto:e@xxxxxxxxxxxx>>
wrote:
I've configured a cluster with replication that uses most of the
advanced features you've implemented including io-threads, afr,
readahead, and writebehind. I am very satisfied with the write
performance, but the file creation performance leaves much to be
desired. What can we do to speed this up?
Creating 100 empty files
# time for i in {1..100}; do touch test.$i;done
real 0m46.913s
user 0m0.023s
sys 0m0.067s
That's about 0.500 seconds just to create an empty file.
In general, what do you advise for tuning the performance of
reading/writing tons of tiny files. Can the client use io-threads to
improve performance? Right now, our application stuffs all the tiny
files in a single directory. Eventually, we were planning on hashing
them out to directories. Would hashing them out into multiple
directories positively and significantly affect the performance of
GlusterFS?
Best,
Erik Osterman
For what it's worth, here are my configurations:
#
# Master
#
volume posix0
type storage/posix # POSIX FS translator
option directory /home/glusterfs # Export this directory
end-volume
volume brick0
type performance/io-threads
option thread-count 8
option queue-limit 1024
subvolumes posix0
end-volume
### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp/server # For TCP/IP transport
# option bind-address 192.168.1.10 <http://192.168.1.10> #
Default is to listen on all
interfaces
option listen-port 6996 # Default is 6996
option client-volume-filename /etc/glusterfs/client.vol
subvolumes brick0
option auth.ip.brick0.allow * # access to "brick" volume
end-volume
#
# Mirror
#
volume posix0
type storage/posix # POSIX FS translator
option directory /home/glusterfs # Export this directory
end-volume
volume mirror0
type performance/io-threads
option thread-count 8
option queue-limit 1024
subvolumes posix0
end-volume
### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp/server # For TCP/IP transport
# option bind-address 192.168.1.11 <http://192.168.1.11> #
Default is to listen on all
interfaces
option listen-port 6996 # Default is 6996
option client-volume-filename /etc/glusterfs/client.vol
subvolumes mirror0
option auth.ip.mirror0.allow * # access to "brick" volume
end-volume
#
# Client
#
### Add client feature and attach to remote subvolume of server
volume brick0
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 216.182.237.155 <http://216.182.237.155> #
IP address of the remote brick
server
option remote-port 6996 # default server port is 6996
option remote-subvolume brick0 # name of the remote volume
end-volume
### Add client feature and attach to remote mirror of brick0
volume mirror0
type protocol/client
option transport-type tcp/client # for TCP/IP transport
option remote-host 216.55.170.26 <http://216.55.170.26> #
IP address of the remote
mirror server
option remote-port 6996 # default server port is 6996
option remote-subvolume mirror0 # name of the remote volume
end-volume
### Add AFR feature to brick
volume afr0
type cluster/afr
subvolumes brick0 mirror0
option replicate *:2 # All files 2 copies (RAID-1)
end-volume
### Add unify feature to cluster the servers. Associate an
### appropriate scheduler that matches your I/O demand.
volume bricks
type cluster/unify
subvolumes afr0
### ** Round Robin (RR) Scheduler **
option scheduler rr
option rr.limits.min-free-disk 2GB
end-volume
### Add performance feature
volume writebehind
type performance/write-behind
option aggregate-size 131072 # aggregate block size in bytes
subvolumes bricks
end-volume
### Add performance feature
volume readahead
type performance/read-ahead
option page-size 131072
option page-count 16
subvolumes writebehind
end-volume
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
http://lists.nongnu.org/mailman/listinfo/gluster-devel
<http://lists.nongnu.org/mailman/listinfo/gluster-devel>