On Tue, Jul 31, 2007 at 08:11:23PM +0300, Janne Peltonen wrote: > On Tue, Jul 31, 2007 at 05:54:41PM +0300, Janne Peltonen wrote: > > On Tue, Jul 31, 2007 at 09:41:21AM -0400, Lon Hohberger wrote: > > > On Tue, Jul 31, 2007 at 03:14:38PM +0300, Janne Peltonen wrote: > > > > On Tue, Jul 10, 2007 at 06:19:22PM -0400, Lon Hohberger wrote: > > > > > > > > > > http://people.redhat.com/lhh/rhel5-test > > > > > > > > > > You'll need at least the updated cman package. The -2.1lhh build of > > > > > rgmanager is the one I just built today; the others are a bit older. > > > > > > > > Well, I installed the new versions of the cman and rgmanager packages I > > > > found there, but to no avail: I still get 1500 invocations of fs.sh per > > > > second. > > > > > > I put a log message in fs.sh: > > > > > > Jul 31 09:27:29 bart clurgmgrd: [4395]: <err> /usr/share/cluster/fs.sh > > > TEST > > > > > > It comes up once every several (10-20) seconds like it's supposed to. > > > > I did the same, with the same results. It seems to me that the clurgmgrd > > process isn't calling the complete script any more times than it's > > supposed to. What I'm seeing are the execs of fs.sh, that is, it > > includes each () and `` and so on. Each fs.sh invocation seems to create > > quite an amount of subshells. > > > > I'm sorry for having misled you. And this all means, there isn't > > probably much reason to read the cluster.conf and rg_test rules output - > > I'll attach them anyway. > > After running the new rgmanager packages for abt four hours without any > of the load fluctuation I'd experienced before (with a more-or-less > four-hour interval, system load first increases slowly until it reaches > a high level - dependent on overall system load - and then swiftly > decreases to near zero, to start increasing again. This fluctuation > peaks at about 5.0 in a system with no users at all, but many services. > If there are many users and the user peak coincides with the base peak, > the system experiences a shortish load peak of abt 100.0, after which it > recovers and the basic load fluctuation becomes visible again). Then the > load averages started increasing again, to something 10.0ish, so - > frustrated - I edited /usr/share/cluster/fs.sh and put an exit 0 to the > switch-case "status|monitor" on $1. Well. Load averages promptly fell > back to under 0.5, disk usage% fell by 30 %-units, and overall system > responsiveness increased considerably. > > So I'll be running my cluster without fs status checks for now. I hope > someone'll work out what's wrong with fs.sh soon... ;) There are a number of things we can do - can you file a bugzilla about this, now that we know what's going on? (and that it's not internal rgmanager difficulties, just inefficient scripting)? -- Lon -- Lon Hohberger - Software Engineer - Red Hat, Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster