On Wed, 2005-02-16 at 22:26, David Teigland wrote: > On Wed, Feb 16, 2005 at 03:39:37PM -0800, Daniel McNeil wrote: > > I have not been able to get my tests to run for more than > > 1 day for the last several tries. This time my test hung > > during mount in kcl_join_service(). My test does mount and umount > > several times for each test run. This time it hung on the > > 22nd test run. It looks like it was starting a 3node test > > where a gfs file system is mounted on all 3 nodes and then > > does a umount/mount 1 node at a time. So this should have > > done an umount on cl031 and then hung on a mount on cl031 > > with cl030 and cl032 having the gfs file system still mounted. > > > A bunch of info is available here: > > http://developer.osdl.org/daniel/GFS/test.11feb2005/ > > I've looked through it and can't pinpoint the problem. Next > time could you also collect /proc/cluster/lock_dlm/debug and > /proc/cluster/dlm_debug ? > > I've set up a similar but simplified test on both of my test > clusters (a 2-node and a 7-node). I can't dedicate these > machines for a full 1-2 day stretch this until the weekend, > though. My test is a loop around: > > - on each node sequentially: unmount/mount gfs > - on each node sequentially: run some load for a couple minutes I started in running again yesterday afternoon. I'll collect all the info when I hit a problem. I still have not made it past 52 hours running this test. Thanks for taking a look, Daniel