Is this normal behaviour?

"Robert Telka" <Robert.Telka@xxxxxxxxxxxxxx> · Wed, 18 Apr 2012 10:40:51 -0400

Seeing some odd behaviour - is this normal???

Goal is to create an active-active environment (from 2 to many serves) for a webfarm with a clustered filesystem (ocfs2).  User-space ocfs2 (ie, via corosync) is only supported at SLES 11 HAE hence the need for the corosync middle man.  (As an aside, kernel-based ocfs2 will continue to work with SLES 11 HAE, but is only supported in an Oracle RAC configuration)

Cluster config:
Based on SLES 11 HAE SP2
Created a cloned "base" group consisting of dlm and o2cb resources (both required for ocfs2 filesystems)
Configured a stonith_sbd resource
Created individual ocfs2 filesystem resources, cloned.  Idea is that individual filesystems can be brought down across the cluster for maintenance.  Each filesystem clone has a startup dependency on the "base" clone group.

Two nodes in the cluster (ignoring quorum).  Haven't yet tested with three or more with/without quorum.

Imagine this scenario:
Server A and B are running; all cloned resources are running on both nodes (dlm, o2cb, and ocfs2 filesystems mounted)
Server A requires downtime for maintenance (eg, add memory, replace failed component, etc)
Server A is placed into standby mode, and all resources on that node are automatically stopped.  Quorum is ignored as any applications running on Server B should continue to run in the event that Server A is powered off.   
When work is complete, Server A is brought back online (from standby)

The problem:
During the transition of Server A from standby to online, Corosync/pacemaker stops ALL cloned resources on Server B, and then starts all resources on Server A and B.

With filesystem I/O occuring on Server B, the filesystems are abruptly unmounted and all I/O is terminated.  Not good, since any inflight transactions are lost with potential filesystem/data corruption.

Is this really the desired behaviour???  Shouldn't the resources be started on Server A WITHOUT impacting the resources running on other servers???

Is this a "group", "clone", or "clone group" behaviour?

Thanks to all for helping shed some light.  I really hope this isn't a feature ;-)

Robert Telka

Royal Canadian Mounted Police
1200 Vanier Parkway
CPIC 2-108
Ottawa, Ontario, K1A 0R2
613-998-6235
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss