One filesystem is mounted with atime because we're storing PHP session files on it so that all servers can get to them. They weren't being garbage collected previously, we had to remount with atime. The other much larger filesystem (8TB versus 1TB for the one with atime enabled) is mounted without atime. This larger one is primarily the one I've experienced the spike in CPU usage when a file transfer begins (via UUCP BTW). We just did some searching and reading about this setting and it looks like setting it to zero means it never drops DLM locks from memory. Ours is at the default 50,000 value, meaning I assume after 50,000 files are "touched" in some way, they start getting dropped out of this cache, which can cause a performance hit. We've looked through "counters" output and aren't sure exactly which one we're looking for that might reveal if this threshold is being exceeded or not. Any pointers? PS. Some info here which tells how to make this _permanent_ and we were surprised adding the setting to sysctl.conf wasn't the primary method mentioned: http://www.redhatmagazine.com/2006/12/15/tips_tricks/ Server1 locks 794922 locks held 386470 incore inodes 379911 metadata buffers 5 unlinked inodes 0 quota IDs 0 incore log buffers 0 log space used 0.05% meta header cache entries 8 glock dependencies 1 glocks on reclaim list 0 log wraps 35 outstanding LM calls 0 outstanding BIO calls 0 fh2dentry misses 1 glocks reclaimed 111468752 glock nq calls 1283605791 glock dq calls 1283083972 glock prefetch calls 51880947 lm_lock calls 63041351 lm_unlock calls 62497023 lm callbacks 125557260 address operations 1213594855 dentry operations 15527454 export operations 1135237 file operations 1739567930 inode operations 25526974 super operations 431704873 vm operations 0 block I/O reads 113537577 block I/O writes 0 Server2 locks 481974 locks held 233640 incore inodes 223437 metadata buffers 5 unlinked inodes 0 quota IDs 0 incore log buffers 0 log space used 0.10% meta header cache entries 0 glock dependencies 0 glocks on reclaim list 0 log wraps 4 outstanding LM calls 0 outstanding BIO calls 0 fh2dentry misses 0 glocks reclaimed 4977455 glock nq calls 0 glock dq calls 0 glock prefetch calls 513249 lm_lock calls 4991067 lm_unlock calls 4518146 lm callbacks 9562059 address operations 23406495 dentry operations 9440989 export operations 0 file operations 1807195617 inode operations 13159135 super operations 2626655 vm operations 0 block I/O reads 24550633 block I/O writes 84806 Server3 locks 73380 locks held 22815 incore inodes 19140 metadata buffers 458 unlinked inodes 0 quota IDs 0 incore log buffers 0 log space used 0.20% meta header cache entries 36 glock dependencies 0 glocks on reclaim list 0 log wraps 60 outstanding LM calls 0 outstanding BIO calls 0 fh2dentry misses 0 glocks reclaimed 2875923 glock nq calls 530954329 glock dq calls 527130026 glock prefetch calls 55222 lm_lock calls 6770608 lm_unlock calls 2739686 lm callbacks 9605743 address operations 317847565 dentry operations 3659746 export operations 770507 file operations 3322 inode operations 8146407 super operations 12936727 vm operations 59 block I/O reads 3236 block I/O writes 7969620 David Teigland wrote: > On Tue, Jan 23, 2007 at 08:39:32AM -0500, Wendell Dingus wrote: > >> I don't know where that breaking point is but I believe _we've_ stepped >> over it. >> > > The number of files in the fs is a non-issue; usage/access patterns is > almost always the issue. > > >> 4-node RHEL3 and GFS6.0 cluster with (2) 2TB filesystems (GULM and no >> LVM) versus >> 3-node RHEL4 (x86_64) and GFS6.1 cluster with (1) 8TB+ filesystem (DLM >> and LVM and way faster hardware/disks) >> >> This is a migration from the former to the latter, so quantity/size of >> files/dirs is mostly identical. Files being transferred from customer >> sites to the old servers never cause more than about 20% CPU load and >> that usually (quickly) falls to 1% or less after the initial xfer >> begins. The new servers run to 100% where they usually remain until the >> transfer completes. The current thinking as far as reason is the same >> thing being discussed here. >> > > This is strange, are you mounting with noatime? Also, try setting this on > each node before it mounts gfs: > > echo "0" > /proc/cluster/lock_dlm/drop_count > > Dave > > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster