On Tue, Jul 31, 2007 at 05:54:41PM +0300, Janne Peltonen wrote: > On Tue, Jul 31, 2007 at 09:41:21AM -0400, Lon Hohberger wrote: > > On Tue, Jul 31, 2007 at 03:14:38PM +0300, Janne Peltonen wrote: > > > On Tue, Jul 10, 2007 at 06:19:22PM -0400, Lon Hohberger wrote: > > > > > > > > http://people.redhat.com/lhh/rhel5-test > > > > > > > > You'll need at least the updated cman package. The -2.1lhh build of > > > > rgmanager is the one I just built today; the others are a bit older. > > > > > > Well, I installed the new versions of the cman and rgmanager packages I > > > found there, but to no avail: I still get 1500 invocations of fs.sh per > > > second. > > > > I put a log message in fs.sh: > > > > Jul 31 09:27:29 bart clurgmgrd: [4395]: <err> /usr/share/cluster/fs.sh > > TEST > > > > It comes up once every several (10-20) seconds like it's supposed to. > > I did the same, with the same results. It seems to me that the clurgmgrd > process isn't calling the complete script any more times than it's > supposed to. What I'm seeing are the execs of fs.sh, that is, it > includes each () and `` and so on. Each fs.sh invocation seems to create > quite an amount of subshells. > > I'm sorry for having misled you. And this all means, there isn't > probably much reason to read the cluster.conf and rg_test rules output - > I'll attach them anyway. After running the new rgmanager packages for abt four hours without any of the load fluctuation I'd experienced before (with a more-or-less four-hour interval, system load first increases slowly until it reaches a high level - dependent on overall system load - and then swiftly decreases to near zero, to start increasing again. This fluctuation peaks at about 5.0 in a system with no users at all, but many services. If there are many users and the user peak coincides with the base peak, the system experiences a shortish load peak of abt 100.0, after which it recovers and the basic load fluctuation becomes visible again). Then the load averages started increasing again, to something 10.0ish, so - frustrated - I edited /usr/share/cluster/fs.sh and put an exit 0 to the switch-case "status|monitor" on $1. Well. Load averages promptly fell back to under 0.5, disk usage% fell by 30 %-units, and overall system responsiveness increased considerably. So I'll be running my cluster without fs status checks for now. I hope someone'll work out what's wrong with fs.sh soon... ;) --Janne -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster