Re: [Linux-cluster] DLM or SM bug after 50 hours

Michael Conrad Tadpol Tilstra <mtilstra@xxxxxxxxxx> · Fri, 17 Dec 2004 08:04:59 -0600



On Thu, Dec 16, 2004 at 05:18:40PM -0800, Daniel McNeil wrote:
> My tests ran for 50 hours!  This is a new record and is running
> with my up_write() before queue_ast() patch.
> 
> It hit an error during a 2 node test (GFS on cl030 and cl031;
> cl032 was a member of the cluster, but no GFS file system mounted).
> 
> On cl030 console:
> 
> SM: 00000001 sm_stop: SG still joined
> SM: 01000410 sm_stop: SG still joined
> 
> /proc/cluster/status shows cl030 is not in cluster
> 
> On cl031 console:
> 
> CMAN: node cl030a is not responding - removing from the cluster
> dlm: stripefs: recover event 6388
> CMAN: node cl030a is not responding - removing from the cluster
> dlm: stripefs: recover event 6388
> name "       5          54bdb0" flags 2 nodeid 0 ref 1
> G 00240122 gr 3 rq -1 flg 0 sts 2 node 0 remid 0 lq 0,0
> [60,000 lines of this]
> ------------[ cut here ]------------
> kernel BUG at /Views/redhat-cluster/cluster/dlm-kernel/src/reccomms.c:128!
> invalid operand: 0000 [#1]

You should append thsi to
https://bugzilla.redhat.com/beta/show_bug.cgi?id=142874

-- 
Michael Conrad Tadpol Tilstra
Earn cash in your spare time -- blackmail your friends
Attachment:
pgpTb7UTPez8z.pgp

Description: PGP signature