We have a two node cluster primarily acting as an NFS serving environment. Our backup infrastructure here uses NetBackup and, unfortunately, NetBackup has no PPC client (we're running on IBM JS20 blades) so we're approaching the backup strategy in two different ways: - Run netbackup client from another machine and point it to NFS share on one of our two cluster nodes - Run rsyncd on our cluster nodes and rsync from a remote machine. NetBackup then backs up that machine. The GFS2 filesystem in our cluster only is storing about 90GB of data, but has about one million files (inodes used reported via df -i) on it. (For the curious, this is a home directory server and we do break thinsg up under a top level hierarchy of a folder for each first letter of a username). The NetBackup over NFS route is extremely slow and spikes the load up on whichever server is being backed up from. We made the following adjustments to try and improve performance: - Set the following in our cluster.conf file: <dlm plock_ownership="1" plock_rate_limit="0"/> <gfs_controld plock_rate_limit="0"/> ping_pong will give me about 3-5k locks/sec now. - Mounted filesystem with noatime,nodiratime,quota=off This seems to have helped a bit, but things are still taking a long time. I should note here that I tried running ping_pong to one of our cluster nodes via one of its NFS exports of the GFS2 filesystem. While I can get 3000-5000 locks/sec locally, over NFS it was about... 2 or 3 (not thousand, literally 2 or 3). tcpdump of the NLM port shows the NFS lock manager on the node responding NLM_BLOCK most of the time. I'm not sure if GFS2 or our NFS daemon is to blame... in any case... .. I've set up rsyncd on the cluster nodes and am sync'ing from a remote server now (all of this via Gigabit ethernet). I'm over an hour in and the client is still generatin the file list. strace confirms that rsync --daemon is still trolling through, generating a list of files on the filesystem... I've done a blktrace dump on my GFS2 filesystem's block device and can clearly see glock_workqueue showing up the most by far. However, I don't know what else I can glean from these results. Anyone have any tips or suggestions on improving either our NFS locking or rsync --daemon performance beyond what I've already tried? It might almost be quicker for us to do a full backup each time than to spend hours building file lists for differential backups :) Details of our setup: - IBM DS4300 Storage (12 drive RAID5 + 2 spares) - Exposed as two LUNs (one per controller) - Don't believe this array does hardware snapshots :( - Two (2) IBM JS20 Blades (PPC) - QLogic ISP2312 2Gb HBA's - RHEL 5.4 Advanced Platform PPC - multipathd - clvm aggregates two LUNs - GFS2 on top of clvm - Configured with quotas originally, but disabled later by mounting quota=off - Mounted with noatime,nodiratime,quota=off # gfs2_tool gettune /domus1 new_files_directio = 0 new_files_jdata = 0 quota_scale = 1.0000 (1, 1) logd_secs = 1 recoverd_secs = 60 statfs_quantum = 30 stall_secs = 600 quota_cache_secs = 300 quota_simul_sync = 64 statfs_slow = 0 complain_secs = 10 max_readahead = 262144 quota_quantum = 60 quota_warn_period = 10 jindex_refresh_secs = 60 log_flush_secs = 60 incore_log_blocks = 1024 # gfs2_tool getargs /domus1 data 2 suiddir 0 quota 0 posix_acl 1 upgrade 0 debug 0 localflocks 0 localcaching 0 ignore_local_fs 0 spectator 0 hostdata jid=1:id=196610:first=0 locktable lockproto Thanks in advance for any advice. Ray -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster