Re: cman_tool flags: dirty follow up question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 19, 2008 at 10:58 AM, Christine Caulfield
<ccaulfie@xxxxxxxxxx> wrote:
> Brett Cave wrote:
>> Hi,
>>
>> After upgrading the RPM's on Centos5, i get Flags: Dirty on cman_tool
>> status. Found a post on this list from May just before I joined. A
>> follow up to this topic.
>>
>> Chrissie: It's a perfectly normal state. in fact it's expected if you
>> are running services. It simply means that the cluster has some
>> services running that have state of their own that cannot be recovered
>> without a full restart. I would be more worried if you did NOT see
>> this in cman_tool status. It's NOT a warning. don't worry about it :)
>>
>>
>> Prior to upgrading my cman and gfs-utils, I was getting "flags: ". The
>> cluster is not running any services other than the internal fence and
>> dlm ones (cman_tool services shows fence only).
>>
>> There was an update to cvs last year September where the flag was
>> added, so I'm guessing that this might resolve the issue. (Think by
>> Chrissie again).
>>
>> Don't quite understand "This node has internal state and must not join
>> a cluster that also has state" description of dirty flag though, does
>> this mean that because the node is part of a cluster it has state? And
>> that only a stateless node can join a cluster with state? (or if the
>> cluster doesn't have state, then the node will be the first one in the
>> cluster to start up...).
>
> The Dirty flag is set by services when they realise they have a state
> they can't combine with an existing cluster. This is usually the DLM or
> GFS in a Red Hat cluster. When a node first starts up it has no state
> and can join in with other cluster nodes that do. It can then create its
> own state quite happily because it can tell the other nodes about it.
>
> The situation this flag is designed to prevent is if two cluster split
> up for a short period of time and then rejoin soon - usually less the
> the time it takes for fencing to take effect or maybe the cluster is
> split evenly so that neither half has quorum. When this occurs each half
> does not know what the other half has been up to during the split, and
> so the two halves cannot be allowed to join in a cluster together again
> for fear of corrupting each other's state.
>
> This results in the dreaded "Disallowed" state that you might see
> (though I hope not). It's usually caused by bad network configuration,
> or excessive traffic. Fiddling with the totem parameters (CAREFULLY!)
> can alleviate it.
>

Thanks, makes sense now.

The cluster is relatively small (3 - 6 nodes initially), so going to
leave ais totem config for now. will start adjusting when we do load
IO testing if recovery is too slow. would like to see how failure
recovery performs when heartbeat failures is enabled vs disabled.

Brett
>
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux