Re: Severe problems with 64-bit RHCS on RHEL5.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2008/4/17 Harri Päiväniemi <harri.paivaniemi@xxxxxxxxxxxxxxx>:
>
> The 2nd problem that still exists is:
>
> When node a and b are running and everything is ok. I stop node b's
> cluster daemons. when I start node b again, this situation stays
> forever:
>
> ----------------
> node a - clustat
> Member Status: Quorate
>
>  Member Name                        ID   Status
>  ------ ----                        ---- ------
>  areenasql1                            1 Online, Local, rgmanager
>  areenasql2                            2 Offline
>  /dev/sda                              0 Online, Quorum Disk
>
>  Service Name         Owner (Last)                   State
>  ------- ----         ----- ------                   -----
>  service:areena       areenasql1                     started
>
> -------------------
>
> node b - clustat
>
> Member Status: Quorate
>
>  Member Name                        ID   Status
>  ------ ----                        ---- ------
>  areenasql1                            1 Online, rgmanager
>  areenasql2                            2 Online, Local, rgmanager
>  /dev/sda                              0 Offline, Quorum Disk
>
>  Service Name         Owner (Last)                   State
>  ------- ----         ----- ------                   -----
>  service:areena       areenasql1                     started
>
>
> So node b's quorum disk is offline, log says it's registred ok and
> heuristic is UP... node a sees node b as offline. If I reboot node b, it
> works ok and joins ok...

Now that you have mentioned it - I remember stumbling upon the similar
problem. It happens if you restart the cluster services before the
cluster realizes the node is dead. I guess it is a bug since the node
is in some sort of limbo state at that moment reporting itsefl being
part of the cluster while the cluster does not recognize it as a
member. If you wait 70 seconds ( cluster.conf: <totem token="70000"/>
) before starting the cluster services then it will come up fine. The
reboot works for you because it take longer than 70 sec (correct me if
I am wrong). So try stopping node b cluster services, wait 70 secs and
then start them back up.

-Alex

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux