Hi all,
I currently have a GFS deployment consisting of eight servers and several GFS volumes. One of my GFS servers is a dedicated backup server with a second replica SAN attached to it through a second HBA. My approach to backups has been with tools such as rsync and rdiff-backup, run on a nightly basis. I am having a particular problem with one or two of my filesystems taking a *very* long time to backup. For example, I have /home living on GFS. Day-to-day performance is acceptable, but backups are hideously slow. Every night, I kick off an rdiff-backup of /home from my backup server, which dumps the backup onto an XFS filesystem on the replica SAN. This backup can take days in some cases.
We have done some investigating, and found that it appears that getdents(2) calls (which give the list of filenames present in a directory) are spectacularly slow on GFS, irrespective of the size of the directory in question. In particular, with 'strace -r', I'm seeing a rate below 100 filenames per second. The filesystem /home has at least 10 million files in it, which doing the math means 29.5 hours just to do the getdents calls to scan them, which is more than a third of wall-clock time. And that's before we even start stat'ing.
I google'd around a bit and I can't see any discussion of slow getdents calls under GFS. Is there any chance we have some sort of tunable turned on/off that might be causing this? I'm not sure which tunables to consider tweaking, even. This seems awfully slow, even with sub-optimal locking. Is there perhaps some tunable I can try tweaking to improve this situation? Any insights would be much appreciated.
--
Brandon
I currently have a GFS deployment consisting of eight servers and several GFS volumes. One of my GFS servers is a dedicated backup server with a second replica SAN attached to it through a second HBA. My approach to backups has been with tools such as rsync and rdiff-backup, run on a nightly basis. I am having a particular problem with one or two of my filesystems taking a *very* long time to backup. For example, I have /home living on GFS. Day-to-day performance is acceptable, but backups are hideously slow. Every night, I kick off an rdiff-backup of /home from my backup server, which dumps the backup onto an XFS filesystem on the replica SAN. This backup can take days in some cases.
We have done some investigating, and found that it appears that getdents(2) calls (which give the list of filenames present in a directory) are spectacularly slow on GFS, irrespective of the size of the directory in question. In particular, with 'strace -r', I'm seeing a rate below 100 filenames per second. The filesystem /home has at least 10 million files in it, which doing the math means 29.5 hours just to do the getdents calls to scan them, which is more than a third of wall-clock time. And that's before we even start stat'ing.
I google'd around a bit and I can't see any discussion of slow getdents calls under GFS. Is there any chance we have some sort of tunable turned on/off that might be causing this? I'm not sure which tunables to consider tweaking, even. This seems awfully slow, even with sub-optimal locking. Is there perhaps some tunable I can try tweaking to improve this situation? Any insights would be much appreciated.
--
Brandon
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster