Seeing some odd behaviour - is this normal???
Goal is to create an active-active environment (from 2 to many serves) for a webfarm with a clustered filesystem (ocfs2). User-space ocfs2 (ie, via corosync) is only supported at SLES 11 HAE hence the need for the corosync middle man. (As an aside, kernel-based ocfs2 will continue to work with SLES 11 HAE, but is only supported in an Oracle RAC configuration)
Cluster config:
Based on SLES 11 HAE SP2
Created a cloned "base" group consisting of dlm and o2cb resources (both required for ocfs2 filesystems)
Configured a stonith_sbd resource
Created individual ocfs2 filesystem resources, cloned. Idea is that individual filesystems can be brought down across the cluster for maintenance. Each filesystem clone has a startup dependency on the "base" clone group.
Two nodes in the cluster (ignoring quorum). Haven't yet tested with three or more with/without quorum.
Imagine this scenario:
Server A and B are running; all cloned resources are running on both nodes (dlm, o2cb, and ocfs2 filesystems mounted)
Server A requires downtime for maintenance (eg, add memory, replace failed component, etc)
Server A is placed into standby mode, and all resources on that node are automatically stopped. Quorum is ignored as any applications running on Server B should continue to run in the event that Server A is powered off.
When work is complete, Server A is brought back online (from standby)
The problem:
During the transition of Server A from standby to online, Corosync/pacemaker stops ALL cloned resources on Server B, and then starts all resources on Server A and B.
With filesystem I/O occuring on Server B, the filesystems are abruptly unmounted and all I/O is terminated. Not good, since any inflight transactions are lost with potential filesystem/data corruption.
Is this really the desired behaviour??? Shouldn't the resources be started on Server A WITHOUT impacting the resources running on other servers???
Is this a "group", "clone", or "clone group" behaviour?
Thanks to all for helping shed some light. I really hope this isn't a feature ;-)
Robert Telka
Royal Canadian Mounted Police
1200 Vanier Parkway
CPIC 2-108
Ottawa, Ontario, K1A 0R2
613-998-6235 |
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss