(no subject)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

sorry, I forgot to mention 1 important point, we are not using DRDB but ext4 fs that are automatialy mounted by corosync as needed. We have a few services (with differents mount points) that run on those nodes and if 1 node fail, services get switched to remaining nodes. 

all 3 nodes can communicate to each other, I can ssh from one to another. Node3 see node1 and node2 (as pointed by corosync-objctl output) but don't join the existing ring, look like it create its own ring. The ring was working fine until I upgraded the os on node3. 

thanks,
Mélanie

On 07/26/2012 03:29 PM, Antonis Christofides wrote:
> Hello,
>
> let me see if I've understood correctly. Assume the following cluster:
>
>
>     Internet              Internet
>       |                     |
>       |                     |
>      node1 -------------- node2
>  (drbd primary)         (drbd secondary)
>
> Now the network is partitioned. node2 can't communicate with node1. So
> it thinks node1 is down. So it makes itself a drbd primary, and clients
> are now writing data to node2. But node1 is not really down, it's just
> disconnected from node2. Some clients are still accessing node1 and
> writing to it. Result: chaos. So we use stonith to guard against this
> issue, but stonith requires a second, independent communication channel
> to node1. I have no experience on this but I guess "independent" is a
> big word.
>
> Instead:
>
>     Internet              Internet
>       |                     |
>       |                     |
>      node1 -------------- node2
>  (drbd primary)         (drbd secondary)
>       |                     |
>       |                     |
>       +-------- node3 ------+
>
>
> The network is partitioned; say node1 is disconnected from both node2
> and node3. Node1, seeing that it no longer has quorum (it is alone),
> switches itself to drbd secondary. node2, seeing that node1 is no longer
> there and that together with node3 it has a quorum, makes itself a drbd
> primary.

Yes, if you don't ignore quorum this is also fine. But at least do
resource-level fencing in drbd do not allow an out-of-date node to be
promoted.

Nevertheless, without stonith you can't recover automatically from
failures like stop-errors.

Regards,
Andreas

--
Need help with Pacemaker/Corosync/DRBD?
http://www.hastexo.com/now

>
> This looks to me simpler and easier to achieve (but again, I have no
> experience). Isn't it a valid alternative solution, that makes stonith
> unnecessary? I'm asking because in some places in the Pacemaker or drbd
> documentation it says "don't do this without stonith!"; it doesn't say
> "don't do this without either stonith or quorum!", and I was therefore
> wondering whether I've understood something wrong.
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
>


_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux