Re: Disastrous performance with rsync to mounted Gluster volume.

David Robinson <david.robinson@xxxxxxxxxxxxx> · Mon, 27 Apr 2015 21:21:08 +0000

I am also having a terrible time with rsync and gluster.  The vast 
majority of my time is spent figuring out what to sync...  This sync 
takes 17-hours even though very little data is being transferred.

sent 120,523 bytes  received 74,485,191,265 bytes  1,210,720.02 
bytes/sec
total size is 27,589,660,889,910  speedup is 370.40

------ Original Message ------
From: "Ben Turner" <bturner@xxxxxxxxxx>
To: "Ernie Dunbar" <maillist@xxxxxxxxxxxxx>
Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
Sent: 4/27/2015 4:52:35 PM
Subject: Re:  Disastrous performance with rsync to 
mounted Gluster volume.

----- Original Message -----
 From: "Ernie Dunbar" <maillist@xxxxxxxxxxxxx>
 To: "Gluster Users" <gluster-users@xxxxxxxxxxx>
 Sent: Monday, April 27, 2015 4:24:56 PM
 Subject: Re:  Disastrous performance with rsync to 
mounted Gluster volume.

 On 2015-04-24 11:43, Joe Julian wrote:

 >> This should get you where you need to be.  Before you start to 
migrate
 >> the data maybe do a couple DDs and send me the output so we can 
get an
 >> idea of how your cluster performs:
 >>
 >> time `dd if=/dev/zero of=<gluster-mount>/myfile bs=1024k 
count=1000;
 >> sync`
 >> echo 3 > /proc/sys/vm/drop_caches
 >> dd if=<gluster mount> of=/dev/null bs=1024k count=1000
 >>
 >> If you are using gigabit and glusterfs mounts with replica 2 you
 >> should get ~55 MB / sec writes and ~110 MB / sec reads.  With NFS 
you
 >> will take a bit of a hit since NFS doesnt know where files live 
like
 >> glusterfs does.

 After copying our data and doing a couple of very slow rsyncs, I did
 your speed test and came back with these results:

 1048576 bytes (1.0 MB) copied, 0.0307951 s, 34.1 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=1024 bs=1024; sync
 1024+0 records in
 1024+0 records out
 1048576 bytes (1.0 MB) copied, 0.0298592 s, 35.1 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=1024 bs=1024; sync
 1024+0 records in
 1024+0 records out
 1048576 bytes (1.0 MB) copied, 0.0501495 s, 20.9 MB/s
 root@backup:/home/webmailbak# echo 3 > /proc/sys/vm/drop_caches
 root@backup:/home/webmailbak# # dd if=/mnt/testfile of=/dev/null
 bs=1024k count=1000
 1+0 records in
 1+0 records out
 1048576 bytes (1.0 MB) copied, 0.0124498 s, 84.2 MB/s

 Keep in mind that this is an NFS share over the network.

 I've also noticed that if I increase the count of those writes, the
 transfer speed increases as well:

 2097152 bytes (2.1 MB) copied, 0.036291 s, 57.8 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=2048 bs=1024; sync
 2048+0 records in
 2048+0 records out
 2097152 bytes (2.1 MB) copied, 0.0362724 s, 57.8 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=2048 bs=1024; sync
 2048+0 records in
 2048+0 records out
 2097152 bytes (2.1 MB) copied, 0.0360319 s, 58.2 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=10240 bs=1024; sync
 10240+0 records in
 10240+0 records out
 10485760 bytes (10 MB) copied, 0.127219 s, 82.4 MB/s
 root@backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
 count=10240 bs=1024; sync
 10240+0 records in
 10240+0 records out
 10485760 bytes (10 MB) copied, 0.128671 s, 81.5 MB/s

This is correct, there is overhead that happens with small files and 
the smaller the file the less throughput you get.  That said, since 
files are smaller you should get more files / second but less MB / 
second.  I have found that when you go under 16k changing files size 
doesn't matter, you will get the same number of 16k files / second as 
you do 1 k files.

 However, the biggest stumbling block for rsync seems to be changes to
 directories. I'm unsure about what exactly it's doing (probably 
changing
 last access times?) but these minor writes seem to take a very long 
time
 when normally they would not. Actual file copies (as in the very 
files
 that are actually new within those same directories) appear to take
 quite a lot less time than the directory updates.

Dragons be here!  Access time is not kept in sync across the 
replicas(IIRC, someone correct me if I am wrong!) and each time a dir 
is read from a different brick I bet the access time is different.

 For example:

 # time rsync -av --inplace --whole-file --ignore-existing 
--delete-after
 gromm/* /mnt/gromm/
 building file list ... done
 Maildir/                        ## This part takes a long time.
 Maildir/.INBOX.Trash/
 Maildir/.INBOX.Trash/cur/

Maildir/.INBOX.Trash/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
 Maildir/.INBOX.Trash/tmp/       ## The previous three lines took 
nearly
 no time at all.
 Maildir/cur/                    ## This takes a long time.
 Maildir/cur/1430160436.H952679P13870.pop.lightspeed.ca:2,S
 Maildir/new/
 Maildir/tmp/                    ## The previous lines again take no 
time
 at all.
 deleting Maildir/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
 ## This delete did take a while.
 sent 1327634 bytes  received 75 bytes  59009.29 bytes/sec
 total size is 624491648  speedup is 470.35

 real 0m26.110s
 user 0m0.140s
 sys 0m1.596s

 So, rsync reports that it wrote 1327634 bytes at 59 kBytes/sec, and 
the
 whole operation took 26 seconds. To write 2 files that were around 
20-30
 kBytes each and delete 1.

 The last rsync took around 56 minutes, when normally such an rsync 
would
 have taken 5-10 minutes, writing over the network via ssh.

It may have something to do with the access times not being in sync 
across replicated pairs.  Maybe some has experience with this / could 
this be tripping up rsync?

-b

 _______________________________________________
 Gluster-users mailing list
 Gluster-users@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users