Re: SOLVED: GFS2 2 Node Cluster - lost Node - Mount not writeable

Thomas Börnert <tb@xxxxxxxxx> · Fri, 29 Feb 2008 02:23:10 +0100

ok i understand. fence_ack_manual node1 works

i tried now fence_drac and it works automatically. fine.

Thx

Thomas

> This is because you are using manual fencing.  Fencing is required to
> ensure that an errant node does not continue to write to the shared
> filesystem after it has lost communication with the cluster, thereby
> corrupting the data.  The only way to do this is to halt all cluster
> activity (including granting GFS locks) until the fencing succeeds.
> The "manual" means that an administrator must intervene and correct the
> problem before cluster operations can resume.  So when you power off
> node1, node2 detects missed heartbeats and fences node1.  Now you must
> manually fence node1 by powering it off (this is already done in your
> case) then do one of the following:
>
>      1) Run the following command to acknowledge that you have manually
> fenced the node
>
>                # /sbin/fence_ack_manual node1
>
>          OR
>
>      2) Start node1 back up and have it rejoin the cluster
>
> The danger with manual fencing comes in when you quickly run
> fence_ack_manual without properly investigating the issue or fencing the
> node.  You may see that the fenced node is still up and quickly run that
> command without noticing that the network connection has been lost.  Now
> the nodes proceed with writing to GFS without being able to communicate
> and they quickly corrupt the data.
>
> So, when using manual fencing always take caution before running
> fence_ack_manual.
>
> John
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster