Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

Artem Russakovskii <archon810@xxxxxxxxx> · Sat, 3 Feb 2018 23:20:06 -0800

Hi,
I have been working on setting up a 4 replica gluster with over a million files (~250GB total), and I've seen some really weird stuff happen, even after trying to optimize for small files. I've set up a 4-brick replicate volume (gluster 3.13.2).

It took almost 2 days to rsync the data from the local drive to the gluster volume, and now I'm running a 2nd rsync that just looks for changes in case more files have been written. I'd like to concentrate this email on a very specific and odd issue.

The dir structure is
YYYY/
          MM/
                 10k+files in each month folder

rsyncing each month folder cold can take 2+ minutes.

However, if I ls the destination folder first, or use find (both of which finish within 5 seconds), the rsync is almost instant.

Here's a log with time calls that shows you what happens.:

box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/08/ 08/ 
sending incremental file list
^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(637) [sender=3.1.0]

real    1m39.848s
user    0m0.010s
sys     0m0.030s
box:/mnt/gluster/uploads/2017 # time find 08 | wc -l  
14254

real    0m0.726s
user    0m0.013s
sys     0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/08/ 08/
sending incremental file list

real    0m0.562s
user    0m0.057s
sys     0m0.137s
box:/mnt/gluster/uploads/2017 # time find 07 | wc -l 
10103

real    0m4.550s
user    0m0.010s
sys     0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/07/ 07/ 
sending incremental file list

real    0m0.428s
user    0m0.030s
sys     0m0.083s
box:/mnt/gluster/uploads/2017 # time ls 06 | wc -l       
11890

real    0m1.850s
user    0m0.077s
sys     0m0.040s
box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/06/ 06/ 
sending incremental file list

real    0m0.627s
user    0m0.073s
sys     0m0.107s
box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/05/ 05/ 
sending incremental file list

real    2m24.382s
user    0m0.127s
sys     0m0.357s

Note how if I precede the rsync call with ls or find, the rsync completes in less than a second (finding no files to sync because they've already been synced). Otherwise, it takes over 2 minutes (I interrupted the first call before the 2 minutes because it was already taking too long).

What could be causing rsync to work so slowly unless the dir is primed?

Volume config:
Volume Name: gluster
Type: Replicate
Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: server1 :/mnt/server1_block4/gluster

Brick2: 

server2 :/mnt/server2_block4/gluster

Brick3: 

server3 :/mnt/server3_block4/gluster

Brick4: 

server4 :/mnt/server4_block4/gluster
Options Reconfigured:
performance.parallel-readdir: off
transport.address-family: inet
nfs.disable: on
cluster.self-heal-daemon: enable
performance.cache-size: 1GB
network.ping-timeout: 5
cluster.quorum-type: fixed
cluster.quorum-count: 1
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 500000
performance.rda-cache-limit: 256MB
performance.read-ahead: off
client.event-threads: 4
server.event-threads: 4

Thank you for any insight.

Sincerely,
Artem

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users