Hi,
Sincerely,
Artem
I have been working on setting up a 4 replica gluster with over a million files (~250GB total), and I've seen some really weird stuff happen, even after trying to optimize for small files. I've set up a 4-brick replicate volume (gluster 3.13.2).
It took almost 2 days to rsync the data from the local drive to the gluster volume, and now I'm running a 2nd rsync that just looks for changes in case more files have been written. I'd like to concentrate this email on a very specific and odd issue.
The dir structure is
YYYY/
MM/
10k+files in each month folder
rsyncing each month folder cold can take 2+ minutes.
However, if I ls the destination folder first, or use find (both of which finish within 5 seconds), the rsync is almost instant.
Here's a log with time calls that shows you what happens.:
box:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/08/ 08/sending incremental file list^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(637) [sender=3.1.0]real 1m39.848suser 0m0.010ssys 0m0.030sbox:/mnt/gluster/uploads/2017 # time find 08 | wc -l14254real 0m0.726suser 0m0.013ssys 0m0.033sbox:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/08/ 08/sending incremental file listreal 0m0.562suser 0m0.057ssys 0m0.137sbox:/mnt/gluster/uploads/2017 # time find 07 | wc -l10103real 0m4.550suser 0m0.010ssys 0m0.033sbox:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/07/ 07/sending incremental file listreal 0m0.428suser 0m0.030ssys 0m0.083sbox:/mnt/gluster/uploads/2017 # time ls 06 | wc -l11890real 0m1.850suser 0m0.077ssys 0m0.040sbox:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/06/ 06/sending incremental file listreal 0m0.627suser 0m0.073ssys 0m0.107sbox:/mnt/gluster/uploads/2017 # time rsync -aPr /srv/www/htdocs/uploads/2017/05/ 05/sending incremental file listreal 2m24.382suser 0m0.127ssys 0m0.357s
Note how if I precede the rsync call with ls or find, the rsync completes in less than a second (finding no files to sync because they've already been synced). Otherwise, it takes over 2 minutes (I interrupted the first call before the 2 minutes because it was already taking too long).
What could be causing rsync to work so slowly unless the dir is primed?
Volume config:
Volume Name: glusterType: ReplicateVolume ID: XXXXXXXXXXXXXXXXXXXXXXXXXStatus: StartedSnapshot Count: 0Number of Bricks: 1 x 4 = 4Transport-type: tcpBricks:Brick1: server1 :/mnt/server1_block4/glusterBrick2: server2 :/mnt/server2_block4/glusterBrick3: server3 :/mnt/server3_block4/glusterBrick4: server4 :/mnt/server4_block4/glusterOptions Reconfigured:performance.parallel-readdir: offtransport.address-family: inetnfs.disable: oncluster.self-heal-daemon: enableperformance.cache-size: 1GBnetwork.ping-timeout: 5cluster.quorum-type: fixedcluster.quorum-count: 1features.cache-invalidation: onfeatures.cache-invalidation-timeout: 600performance.cache-invalidation: onperformance.md-cache-timeout: 600network.inode-lru-limit: 500000performance.rda-cache-limit: 256MBperformance.read-ahead: offclient.event-threads: 4server.event-threads: 4
Thank you for any insight.
Sincerely,
Artem
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users