Re: Same OOPS on both cluster nodes, sepirated by a week.

Patrick Caulfield <pcaulfie@xxxxxxxxxx> · Mon, 20 Jun 2005 20:13:09 +0100

On Mon, Jun 20, 2005 at 12:07:26PM -0400, Eric Kerin wrote:
> I got the following oops messages on my cluster nodes, both at different
> times.  Once was on node A, I was running a clustat, and did a ctrl-4 to
> kill it, (it was taking a long while to run, seemed to be blocked by
> something).  The second time after doing that OOPS#1 showed up.  The
> second oops showed up on the b node, the cluster was running, and I
> wasn't actually doing anything outside of watching a tcpdump to watch
> some data flow by, went away for about 10 minutes, and when I came back
> node B had blocked up, and was fenced by A.  The OOPS was in the
> messages file. 
> 

Well, they're both the same oops. It looks like a race between the AST being 
delivered and the process shutting down. I'm not in a position to look at it
in more detail ATM - I'll investigate when I get back to base.

It might be good to have this in bugzilla. IYWBSK
-- 

patrick

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster