On Wed, 2007-01-24 at 00:11 +0100, Jos Vos wrote: > Hi, > > I have a configuration with two servers and a shared storage cabinet > (connected via two *independent* SCSI busses) causing fatal SCSI errors > when one server is doing a lot of I/O and the other server is rebooting > (i.e. loading the Linux driver and initializing the controller). > > This problem is fully reproducable with the latest RHEL4 kernel, but > it is *not* reproducable with RHEL5b2. > > When using this shared device with cluster suite and GFS (I only tried > this with RHEL4), the GFS filesystem is damaged unrepairable when one > node reboots! > > I see some buzilla entries about this driver (although with different > errors) and when Googling I found some more complaints about weak error > handling/recovery in this driver. > > I tried to port the MPT Fusion driver from the RHEL5b2 kernel to the > RHEL4 kernel, but this seems to require some non-trivial backporting. > > Is this indeed a problem with the LSI driver? Are there any upgrades > for the driver that can be compiled for the RHEL4 kernels? I've seen abysmal performance in some megaraid+jbod configurations (e.g. 50+ seconds to get 2 block reads and 2 block writes on RHEL2.1), but I've never seen corruption like what you're describing. Of course, it's been almost 4 years since I used a host-RAID configuration, and I haven't ever used one with GFS... ;( Apparently the SCSI megaraid driver has gone into maintenance mode, so it's not going to get any better. Your controllers are in "cluster mode" and/or "have cache entirely disabled", right? -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster