On Mon, 3 Mar 2008, Lon Hohberger wrote:
I have a 2-node cluster with Open Shared Root on GFS on DRBD.
Last week, I saw a car with a license plate from 'Wyoming'. Now,
someone's running GFS on shared root DRBD. My world's turning upside
down.
LOL! We live in interesting times. :)
And anyway, what's wrong with GFS shared root on DRBD? :)
A single
node mounts GFS OK and works, but after a while seems to just block for
disk. Very much as if it started trying to fence the other node and is
waiting for acknowledgement.
If CMAN was trying to fence, you'd see it in /var/log/messages. I'm not
sure about DRBD.
I can't see any evidence of that, and I'd expect to see something on the
console about it, too. I'll set up a remote syslog to double-check.
There are no fence devices defined (so this
could be a possibility),
Unlikely. Even if this was the cause, you'd still see it (and you could
work around it).
Unfortunately, it doesn't end there. When an attempt is made to dual-mount
the GFS file system before the secondary is fully up to date (but is
connected and syncing), the 2nd node to join notices an inconsistency, and
withdraws from the cluster. In the process, GFS gets corrupted, and the
only way to get it to mount again on either node is to repair it with
fsck.
Off the top of my head, this sounds like a DRBD thing. If sync's
completed, it works, right?
Not quite - it works in as far as it gets as far as mounting the file
system without noticing it to be inconsistent (presumably because it isn't
changing underneath it). But the FS gets corrupted.
I cannot be sure right now, but I have a suspicion that both machines
might be trying to mount the FS with the same journal. I could be
mis-remembering and/or mis-interpreting what mount output says when it's
connecting, though. I'll check it via the remote console in a bit and
paste the output from each node.
Gordan
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster