Is this normal behaviour?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Seeing some odd behaviour - is this normal???
 
Goal is to create an active-active environment (from 2 to many serves) for a webfarm with a clustered filesystem (ocfs2).  User-space ocfs2 (ie, via corosync) is only supported at SLES 11 HAE hence the need for the corosync middle man.  (As an aside, kernel-based ocfs2 will continue to work with SLES 11 HAE, but is only supported in an Oracle RAC configuration)
 
Cluster config:
Based on SLES 11 HAE SP2
Created a cloned "base" group consisting of dlm and o2cb resources (both required for ocfs2 filesystems)
Configured a stonith_sbd resource
Created individual ocfs2 filesystem resources, cloned.  Idea is that individual filesystems can be brought down across the cluster for maintenance.  Each filesystem clone has a startup dependency on the "base" clone group.
 
Two nodes in the cluster (ignoring quorum).  Haven't yet tested with three or more with/without quorum.
 
Imagine this scenario:
Server A and B are running; all cloned resources are running on both nodes (dlm, o2cb, and ocfs2 filesystems mounted)
Server A requires downtime for maintenance (eg, add memory, replace failed component, etc)
Server A is placed into standby mode, and all resources on that node are automatically stopped.  Quorum is ignored as any applications running on Server B should continue to run in the event that Server A is powered off.  
When work is complete, Server A is brought back online (from standby)
 
The problem:
During the transition of Server A from standby to online, Corosync/pacemaker stops ALL cloned resources on Server B, and then starts all resources on Server A and B.
 
With filesystem I/O occuring on Server B, the filesystems are abruptly unmounted and all I/O is terminated.  Not good, since any inflight transactions are lost with potential filesystem/data corruption.
 
Is this really the desired behaviour???  Shouldn't the resources be started on Server A WITHOUT impacting the resources running on other servers???
 
Is this a "group", "clone", or "clone group" behaviour?
 
Thanks to all for helping shed some light.  I really hope this isn't a feature ;-)
 
Robert Telka
 
 
 
 
Royal Canadian Mounted Police
1200 Vanier Parkway
CPIC 2-108
Ottawa, Ontario, K1A 0R2
613-998-6235
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux