Re: Is this normal behaviour?

Andreas Kurz <andreas@xxxxxxxxxxx> · Thu, 19 Apr 2012 09:57:49 +0200

On 04/18/2012 04:40 PM, Robert Telka wrote:
> Seeing some odd behaviour - is this normal???
>  
> Goal is to create an active-active environment (from 2 to many serves)
> for a webfarm with a clustered filesystem (ocfs2).  User-space ocfs2
> (ie, via corosync) is only supported at SLES 11 HAE hence the need for
> the corosync middle man.  (As an aside, kernel-based ocfs2 will continue
> to work with SLES 11 HAE, but is only supported in an Oracle RAC
> configuration)
>  
> Cluster config:
> Based on SLES 11 HAE SP2
> Created a cloned "base" group consisting of dlm and o2cb resources (both
> required for ocfs2 filesystems)
> Configured a stonith_sbd resource
> Created individual ocfs2 filesystem resources, cloned.  Idea is
> that individual filesystems can be brought down across the cluster for
> maintenance.  Each filesystem clone has a startup dependency on the
> "base" clone group.

and the filesystem clones have all meta-attribute "interleave=true" defined?

Regards,
Andreas

-- 
Need help with Corosync?
http://www.hastexo.com/service/remote

>  
> Two nodes in the cluster (ignoring quorum).  Haven't yet tested with
> three or more with/without quorum.
>  
> Imagine this scenario:
> Server A and B are running; all cloned resources are running on both
> nodes (dlm, o2cb, and ocfs2 filesystems mounted)
> Server A requires downtime for maintenance (eg, add memory, replace
> failed component, etc)
> Server A is placed into standby mode, and all resources on that node are
> automatically stopped.  Quorum is ignored as any applications running on
> Server B should continue to run in the event that Server A is powered
> off.  
> When work is complete, Server A is brought back online (from standby)
>  
> The problem:
> During the transition of Server A from standby to online,
> Corosync/pacemaker stops ALL cloned resources on Server B, and then
> starts all resources on Server A and B.
>  
> With filesystem I/O occuring on Server B, the filesystems are abruptly
> unmounted and all I/O is terminated.  Not good, since any inflight
> transactions are lost with potential filesystem/data corruption.
>  
> Is this really the desired behaviour???  Shouldn't the resources be
> started on Server A WITHOUT impacting the resources running on other
> servers???
>  
> Is this a "group", "clone", or "clone group" behaviour?
>  
> Thanks to all for helping shed some light.  I really hope this isn't a
> feature ;-)
>  
> Robert Telka
>  
>  
>  
>  
> Royal Canadian Mounted Police
> 1200 Vanier Parkway
> CPIC 2-108
> Ottawa, Ontario, K1A 0R2
> 613-998-6235
> 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss