Re: Using GFS without a network?

Steve Wilcox <spwilcox@xxxxxxx> · Thu, 08 Sep 2005 13:22:34 -0400

On Thu, 2005-09-08 at 13:16 +0200, Bob Marcan wrote:
> Steve Wilcox wrote:
> > On Wed, 2005-09-07 at 19:43 +1000, Keith Hopkins wrote:
> > 
> >>Steve Wilcox wrote:
> >>
> >>>On Tue, 2005-09-06 at 20:06 -0400, Steve Wilcox wrote:
> >>>
> >>>
> >>>>On Wed, 2005-09-07 at 00:57 +0200, Andreas Brosche wrote:
> >>>>
> >>>>
> >>>>
> >>>>>>- Multi-initator SCSI buses do not work with GFS in any meaningful way,
> >>>>>>regardless of what the host controller is.
> >>>>>>Ex: Two machines with different SCSI IDs on their initiator connected to
> >>>>>>the same physical SCSI bus.
> >>>>>
> >>>>>Hmm... don't laugh at me, but in fact that's what we're about to set up.
> >>>>>
> >>>>>I've read in Red Hat's docs that it is "not supported" because of 
> >>>>>performance issues. Multi-initiator buses should comply to SCSI 
> >>>>>standards, and any SCSI-compliant disk should be able to communicate 
> >>>>>with the correct controller, if I've interpreted the specs correctly. Of 
> >>>>>course, you get arbitrary results when using non-compliant hardware... 
> >>>>>What are other issues with multi-initiator buses, other than performance 
> >>>>>loss?
> >>>>
> >>>>I set up a small 2 node cluster this way a while back, just as a testbed
> >>>>for myself.  Much as I suspected, it was severely unstable because of
> >>>>the storage configuration, even occasionally causing both nodes to crash
> >>>>when one was rebooted due to SCSI bus resets.  I tore it down and
> >>>>rebuilt it several times, configuring it as a simple failover cluster
> >>>>with RHEL3 and RHEL4, a GFS cluster under RHEL4 and Fedora4, and as an
> >>>>openSSI cluster using Fedora3.  All tested configurations were equally
> >>>>crash-happy due to the bus resets.  
> >>>>
> >>>>My configuration consisted of a couple of old Compaq deskpro PC's, each
> >>>>with a single ended Symbiosis card (set to different SCSI ID's
> >>>>obviously) and an external DEC BA360 jbod shelf with 6 drives.  The bus
> >>>>resets might be mitigated somewhat by using HVD SCSI and Y-cables with
> >>>>external terminators, but from my previous experience with other
> >>>>clusters that used this technique (DEC ASE and HP-ux service guard), bus
> >>>>resets will always be a thorn in your side without a separate,
> >>>>independent raid controller to act as a go-between.  Calling these
> >>>>configurations simply "not supported" is an understatement - this type
> >>>>of config is guaranteed trouble.  I'd never set up a cluster this way
> >>>>unless I'm the only one using it, and only then if I don't care one
> >>>>little bit about crashes and data corruption.  My two cents.
> >>>>
> >>>>-steve
> >>>
> >>>
> >>>
> >>>Small clarification - Although clusters from DEC, HP, and even
> >>>DigiComWho?Paq's TruCluster can be made to work (sort of) on multi-
> >>>initiator SCSI busses, IIRC it was never a supported option for any of
> >>>them (much like RedHat's offering).  I doubt any sane company would ever
> >>>support that type of config.
> >>>
> >>>-steve   
> >>>
> >>
> >>HP-UX ServiceGuard words well with multi-initiator SCSI configurations, and is fully supported by HP.  It is sold that way for small 2-4 node clusters when cost is an issue, although FC has become a big favorite (um...money maker) in recent years.  Yes, SCSI bus resets are a pain, but they are handled by HP-UX, not ServiceGuard.
> >>
> >>--Keith
> > 
> > 
> > Hmmm...   Are you sure you're thinking of a multi-initiator _bus_ and
> > not something like an external SCSI array (i.e. nike arrays or some such
> > thing)?  I know that multi-port SCSI hubs are available, and more than
> > one HBA per node is obviously supported for multipathing, but generally
> > any multi-initiator SCSI setup will be talking to an external raid
> > array, not a simple SCSI bus, and even then bus resets can cause grief.
> > Admittedly, I'm much more familiar with the Alpha server side of things
> 
> ==========================> should be unfamiliar
> 
> > (multi-initiator buses were definitely never supported under DEC unix /
> > Tru64) , so I could be wrong about HP-ux.  I just can't imagine that a
> > multi-initiator bus wouldn't be a nightmare.   
> > 
> > -steve
> > 
> > --
> > 
> > Linux-cluster@xxxxxxxxxx
> > http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> In the past i was using the SCSI cluster on OpenVMS(AXP,VAX) and Tru64.
> At home i have 2 DS10 with memmory channel, shared SCSI Tru64 cluster.
> Memmory channel was prerequisite in early days, now you can use ethernet 
> as CI.
> I still have some customers using the SCSI cluster on Tru64.
> Two of this are banks, running this configuration a few years.
> Without any problems. Using host based shadowing.
> Tru64 has single point of failure in this configuration.
> Quorum disk can't be shadowed.
> OpenVMS doesn't have this limitation.
> 
> It is supported.
> 
> OpenVMS 
> http://h71000.www7.hp.com/doc/82FINAL/6318/6318pro_002.html#interc_sys_table
> 
> Tru64
> http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/ARHGWETE/CHNTRXXX.HTM#sec-generic-cluster
> ...
> 
> Best regards, Bob
> 

I'll ignore the insult and get to the meat of the matter...

I'm well aware of that doc.  I got burned by it a few years ago when I
set up a dev cluster of ES40's based on it.  Everything was humming
along just fine until I load tested our Oracle database - at something
like 700,000 transactions per hour, guess what happened?  I had a flurry
of bus resets, followed by a flurry of advfs domain panics, resulting in
a crashed cluster.  When I called my gold support TAM for help in
debugging the issue, I was told that "yeah, shared buses will do that.
That's why we don't provide technical support for that configuration".
When I pointed out that their own documentation claimed it was a
"supported" configuration I was told that "supported" only meant
technically possible as far as that doc goes - not that HP would provide
break-fix support.  Maybe that's changed in the last couple years, but
I'd doubt it - as my TAM said, shared SCSI buses WILL do that, no way
around it really.  If you're not having problems with resets, you're
simply not loading the bus that heavily.

It's all a moot point though - like Lon said, this is a Linux mailing
list, not a Tru64, HP-ux, or (god forbid) VMS list, so unless this
discussion is going somewhere productive we should probably stop wasting
bandwidth.  If you want to have a Unix pissing contest, we should do it
off list.

-steve

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster