Re: Megaraid and Dell PERC 4 controllers

Steve Sutphen <steve@xxxxxxxxxxxxxx> · Tue, 30 Aug 2005 01:43:30 -0600

Seokmann,
This sounds identical to a crash that I had on Saturday.
I have a server that has a dual Opteron/244 with 2GB of memory (4x512MB
400MHz, Registered ECC, Corsair CM72SD512RLP-3) on a Tyan Opteron 8131
motherboard.  The controller is the LSI MegaRAID SATA II 300-8X PCI-X
(P/N LSI00005 with the LSI00012 battery backup).  The system is fairly new,
it was manufactured on 06/22/05 and put in service about a mounth later.
The MegaRAID controller has 8 Seagate ST3250823AS 250GB SATA drives with 
NCQ.  
The RAID array is a RAID5 array with a global spare.  It is divided 
into two nearly equal sized logical disks.  The controller parameters 
are set to:
FlexRAID PowerFail = ENABLED
Command Que = Enabled

both logical drives are set to:
RAID = 5
Size = 712392MB 
StripeSize = 64KB 
{Write Policy = WRTHRU
Read Policy = NORMAL
Cache Policy = DirectIO
#Stripes = 7
State = OPTIMAL

The system is running Red Hat Enterprise Linux AS release 4 (Nahant Update 1)
With an updated kernel (I am booting off of a SATA disk on the 
Silicon Image, Inc. SiI 3114 controller which was only fixed in recent
kernels and firmware):
Kernel 2.6.11.12 on a 2-processor i686

The system is being used primarily as an NFS server. It also serves as
the head node for a small cluster.  It does the Ganglia data collection
task for the cluster.  Looking at the Ganglia data does not indicate
that there was much of a load on the system just before the crash.  
Although Ganglia is not recording disk I/O's I do not see much indirect 
evidence that there was heavy disk I/O: the CPUs are steady state--
around 97% idle, and no particular peaks or valleys.  Same with the 
number of packets and network bytes transmitted/received, and memory 
usage.  It all seems normal, with no particular peaks just before
I rebooted it (as with the original case--the system kept running,
although it was logging lots of disk I/O failed messages becuse the 
controller had been off-lined.

I am attaching a file that has the log records from the last 
reboot (we had moved it to a UPS just under 4 days before the 
controller locked up) showing the megaraid initialization,
and the sequence of error (condensed) messages from the controller 
up to the point where it off-lined the array(s).

Other than this incident the system has been running fine since it was
installed.  I hope that this helps.  If you have any suggestions 
please tell me as I am worried that this may happen again.

Thank you,
	steve.

On Mon, Aug 29, 2005 at 04:25:52PM -0400, Ju, Seokmann wrote:
> FYI - Resending due to failure on previous sending.  
> 
> > -----Original Message-----
> > From: Ju, Seokmann 
> > Sent: Friday, August 26, 2005 11:00 AM
> > To: 'Jonathan Fischer'
> > Cc: Kolli, Neela Syam
> > Subject: RE: Megaraid and Dell PERC 4 controllers
> > 
> > Hi Jonathan,
> > 
> > On Tuesday, August 23, 2005 4:52 PM, Jonathan Fischer wrote:
> > > I think next up I'm trying writethru mode, instead of write 
> > back, but
> > > has anyone seen anything like this, or have any insight they might
> > > offer?  I'm quickly getting to the point of being stumped.
> > Can you please specify detail system configuration? (memory 
> > size, # of cpus)
> > And, what kind of load are you putting on the system when it locks up.
> > Also, I assuem that the system doesn't have any monitoring 
> > applications running for those PERC controllers. Please confirm this.
> > From the message, the controller takes more than 3 minutes to 
> > return certain I/O requests and it leads system to lock up.
> > 
> > Thank you.
> > 
> > Seokmann
> > 
> > > -----Original Message-----
> > > From: Jonathan Fischer [mailto:jfischer@xxxxxxxxx] 
> > > Sent: Tuesday, August 23, 2005 4:52 PM
> > > To: linux-scsi@xxxxxxxxxxxxxxx
> > > Subject: Megaraid and Dell PERC 4 controllers
> > > 
> > > I apologize if this is the wrong list to ask this kind of 
> > question on;
> > > I've posted on Dell's PowerEdge list and Red Hat's lists as 
> > > well, but I
> > > figure the people here might know better what to try for 
> > this problem.
> > > 
> > > I have 2 Dell PowerEdge 2850's, one with a PERC 4e/DC raid 
> > controller,
> > > and the other with a PERC 4e/Di.  On both of these systems, I can
> > > reliably cause the controllers to lock up under heavy load.  This is
> > > using a fully up-to-date Red Hat 4 EL (non x86_64) 
> > > installation on both
> > > computers.  The controllers use the megaraid_mbox driver.
> > > 
> > > During a period of high load, the controller suddenly seems to stop
> > > responding to the driver, causing the driver to go into a 
> > waiting loop
> > > for it.  It waits 3 minutes for the controller to respond, which it
> > > never does, and then takes the controller offline, pretty 
> > much yanking
> > > the filesystem out from underneath the OS.
> > > 
> > > Some things keep running alright, so (working with Red Hat's 
> > > support) I
> > > got the thing set up to netdump to another server to see if we could
> > > figure out what was going wrong.  The kernel never actually 
> > > crashes, so
> > > netdump doesn't produce a vmcore to look through, but syslog keeps
> > > spouting out information, so I've got that.
> > > 
> > > Every time this lockup occurs, the log file looks like this:
> > > 
> > > megaraid: aborting-29762 cmd=2a <c=2 t=0 l=0>
> > > megaraid abort: 29762:21[255:128], fw owner
> > > megaraid: aborting-29763 cmd=2a <c=2 t=0 l=0>
> > > megaraid abort: 29763:39[255:128], fw owner
> > > megaraid: aborting-29764 cmd=2a <c=2 t=0 l=0>
> > > megaraid abort: 29764:16[255:128], fw owner
> > > megaraid: aborting-29768 cmd=2a <c=2 t=0 l=0>
> > > megaraid abort: 29768:53[255:128], fw owner
> > > 
> > > 	This part repeats 64 times, then...
> > > 
> > > megaraid: aborting-29831 cmd=2a <c=2 t=0 l=0>
> > > megaraid abort: 29831:8[255:128], fw owner
> > > megaraid: resetting the host...
> > > megaraid: 64 outstanding commands. Max wait 180 sec
> > > megaraid mbox: Wait for 64 commands to complete:180
> > > megaraid mbox: Wait for 64 commands to complete:175
> > > 	
> > > 	megaraid mbox counts down to 0, and then...
> > > 
> > > megaraid mbox: critical hardware error!
> > > megaraid: resetting the host...
> > > megaraid: hw error, cannot reset
> > > megaraid: resetting the host...
> > > megaraid: hw error, cannot reset
> > > SCSI error : <0 2 0 0> return code = 0x6000000
> > > end_request: I/O error, dev sda, sector 242938701
> > > Buffer I/O error on device dm-4, logical block 9893952 lost 
> > page write
> > > due to I/O error on dm-4
> > > scsi0 (0:0): rejecting I/O to offline device
> > > 
> > > The commands that the driver are waiting for are always the 
> > > same, except
> > > for the sequence number (the number right after "aborting-" 
> > > and  "abort:
> > > ").  And there are always 64 commands backed up that the driver is
> > > waiting for.
> > > 
> > > Both machines in question pass memtest86 and Dell's 
> > > diagnostic sets, and
> > > since the failure is identical in both I don't believe it's bad
> > > hardware.  We've got the latest BIOS, RAID firmware, and backplane
> > > firmware on the machines.
> > > 
> > > I've also tried:
> > > - the RHEL 4 Update 2 Beta kernel (at Red Hat's suggestion)
> > > - RHEL 4 x86_64
> > > - RHEL 3 x86_64
> > > - Fedora Core 4 x86
> > > - disabling Patrol Read in the RAID bios
> > > - disabling read-ahead in the RAID bios
> > > - changing the writeback cache flush to every 2 seconds, 
> > > instead of the
> > > default 4
> > > 
> > > I think next up I'm trying writethru mode, instead of write 
> > back, but
> > > has anyone seen anything like this, or have any insight they might
> > > offer?  I'm quickly getting to the point of being stumped.
> > > 
> > > Jonathan Fischer
> > > Operating Systems Analyst - CSU San Marcos
> > > jfischer@xxxxxxxxx
> > > 
> > > -
> > > : send the line "unsubscribe 
> > > linux-scsi" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> -
> : send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Red Hat Enterprise Linux AS release 4 (Nahant Update 1)
Kernel 2.6.11.12 on a 2-processor i686

Aug 23 19:49:03 brule kernel: megaraid cmm: 2.20.2.5 (Release Date: Fri Jan 21 00:01:03 EST 2005)
Aug 23 19:49:03 brule kernel: megaraid: 2.20.4.5 (Release Date: Thu Feb 03 12:27:22 EST 2005)
Aug 23 19:49:03 brule kernel: megaraid: probe new device 0x1000:0x0409:0x1000:0x3008: bus 2:slot 14:func 0
Aug 23 19:49:03 brule kernel: ACPI: PCI interrupt 0000:02:0e.0[C] -> GSI 28 (level, low) -> IRQ 28
Aug 23 19:49:03 brule kernel: megaraid: fw version:[813i] bios version:[H430]
Aug 23 19:49:03 brule kernel: scsi0 : LSI Logic MegaRAID driver
Aug 23 19:49:03 brule kernel: scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
Aug 23 19:49:03 brule kernel: scsi[0]: scanning scsi channel 1 [virtual] for logical drives
Aug 23 19:49:03 brule kernel:   Vendor: MegaRAID  Model: LD 0 RAID5  712G  Rev: 813i
Aug 23 19:49:03 brule kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
Aug 23 19:49:03 brule kernel:   Vendor: MegaRAID  Model: LD 1 RAID5  712G  Rev: 813i
Aug 23 19:49:03 brule kernel:   Type:   Direct-Access                      ANSI SCSI revision: 02
Aug 23 19:49:03 brule kernel: ACPI: PCI interrupt 0000:04:05.0[A] -> GSI 19 (level, low) -> IRQ 19
Aug 23 19:49:03 brule kernel: ata1: SATA max UDMA/100 cmd 0xF8806C80 ctl 0xF8806C8A bmdma 0xF8806C00 irq 19
Aug 23 19:49:03 brule kernel: ata2: SATA max UDMA/100 cmd 0xF8806CC0 ctl 0xF8806CCA bmdma 0xF8806C08 irq 19
Aug 23 19:49:03 brule kernel: ata3: SATA max UDMA/100 cmd 0xF8806E80 ctl 0xF8806E8A bmdma 0xF8806E00 irq 19
Aug 23 19:49:03 brule kernel: ata4: SATA max UDMA/100 cmd 0xF8806EC0 ctl 0xF8806ECA bmdma 0xF8806E08 irq 19
Aug 23 19:49:03 brule kernel: ata1: dev 0 ATA, max UDMA/133, 234441648 sectors: lba48
Aug 23 19:49:03 brule kernel: ata1: dev 0 configured for UDMA/100
Aug 23 19:49:03 brule kernel: scsi1 : sata_sil
Aug 23 19:49:03 brule kernel: ata2: no device found (phy stat 00000000)
Aug 23 19:49:03 brule kernel: scsi2 : sata_sil
Aug 23 19:49:03 brule kernel: ata3: no device found (phy stat 00000000)
Aug 23 19:49:03 brule kernel: scsi3 : sata_sil
Aug 23 19:49:03 brule kernel: ata4: no device found (phy stat 00000000)
Aug 23 19:49:03 brule kernel: scsi4 : sata_sil
Aug 23 19:49:03 brule kernel:   Vendor: ATA       Model: ST3120026AS       Rev: 3.05
Aug 23 19:49:03 brule kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
Aug 23 19:49:03 brule kernel: SCSI device sda: 1458978816 512-byte hdwr sectors (746997 MB)
Aug 23 19:49:03 brule kernel: sda: asking for cache data failed
Aug 23 19:49:03 brule kernel: sda: assuming drive cache: write through
Aug 23 19:49:04 brule kernel: SCSI device sda: 1458978816 512-byte hdwr sectors (746997 MB)
Aug 23 19:49:04 brule kernel: sda: asking for cache data failed
Aug 23 19:49:04 brule kernel: sda: assuming drive cache: write through
Aug 23 19:49:04 brule kernel:  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 >
Aug 23 19:49:04 brule kernel: Attached scsi disk sda at scsi0, channel 1, id 0, lun 0
Aug 23 19:49:04 brule kernel: SCSI device sdb: 1458978816 512-byte hdwr sectors (746997 MB)
Aug 23 19:49:04 brule kernel: sdb: asking for cache data failed
Aug 23 19:49:04 brule kernel: sdb: assuming drive cache: write through
Aug 23 19:49:04 brule kernel: SCSI device sdb: 1458978816 512-byte hdwr sectors (746997 MB)
Aug 23 19:49:04 brule kernel: sdb: asking for cache data failed
Aug 23 19:49:04 brule kernel: sdb: assuming drive cache: write through
Aug 23 19:49:04 brule kernel:  sdb: sdb1 sdb2 sdb3 sdb4
Aug 23 19:49:04 brule kernel: Attached scsi disk sdb at scsi0, channel 1, id 1, lun 0
Aug 23 19:49:04 brule kernel: SCSI device sdc: 234441648 512-byte hdwr sectors (120034 MB)
Aug 23 19:49:04 brule kernel: SCSI device sdc: drive cache: write back
Aug 23 19:49:04 brule kernel: SCSI device sdc: 234441648 512-byte hdwr sectors (120034 MB)
Aug 23 19:49:04 brule kernel: SCSI device sdc: drive cache: write back
Aug 23 19:49:04 brule kernel:  sdc: sdc1 sdc2 sdc3 < sdc5 sdc6 sdc7 sdc8 > sdc4
Aug 23 19:49:04 brule kernel: Attached scsi disk sdc at scsi1, channel 0, id 0, lun 0
Aug 23 19:49:04 brule kernel: Attached scsi generic sg0 at scsi0, channel 1, id 0, lun 0,  type 0
Aug 23 19:49:04 brule kernel: Attached scsi generic sg1 at scsi0, channel 1, id 1, lun 0,  type 0
Aug 23 19:49:04 brule kernel: Attached scsi generic sg2 at scsi1, channel 0, id 0, lun 0,  type 0
... the disk ran fine for nearly 4 days

Aug 27 16:19:56 brule kernel: megaraid: aborting-35347365 cmd=2a <c=1 t=0 l=0>
Aug 27 16:19:56 brule kernel: megaraid abort: 35347365:95[255:128], fw owner
Aug 27 16:19:56 brule kernel: megaraid: aborting-35347366 cmd=2a <c=1 t=0 l=0>
Aug 27 16:19:56 brule kernel: megaraid abort: 35347366:121[255:128], fw owner
Aug 27 16:19:56 brule kernel: megaraid: aborting-35347367 cmd=2a <c=1 t=0 l=0>
...
Aug 27 16:19:57 brule kernel: megaraid: aborting-35347510 cmd=2a <c=1 t=0 l=0>
Aug 27 16:19:57 brule kernel: megaraid abort: 35347510:112[255:128], fw owner
Aug 27 16:19:57 brule kernel: megaraid: reseting the host...
Aug 27 16:19:57 brule kernel: megaraid: 64 outstanding commands. Max wait 180 sec
Aug 27 16:19:57 brule kernel: megaraid mbox: Wait for 64 commands to complete:180
Aug 27 16:20:01 brule kernel: megaraid mbox: Wait for 64 commands to complete:175
Aug 27 16:20:06 brule kernel: megaraid mbox: Wait for 1 commands to complete:170
Aug 27 16:20:11 brule kernel: megaraid mbox: Wait for 1 commands to complete:165
Aug 27 16:20:16 brule kernel: megaraid mbox: Wait for 1 commands to complete:160
...
Aug 27 16:22:51 brule kernel: megaraid mbox: Wait for 1 commands to complete:5
Aug 27 16:22:56 brule kernel: megaraid mbox: Wait for 1 commands to complete:0
Aug 27 16:23:01 brule kernel: megaraid mbox: Wait for 1 commands to complete:-5
...
Aug 27 16:24:46 brule kernel: megaraid mbox: Wait for 1 commands to complete:-110
Aug 27 16:24:51 brule kernel: megaraid mbox: Wait for 1 commands to complete:-115
Aug 27 16:24:56 brule kernel: megaraid mbox: critical hardware error!
Aug 27 16:24:56 brule kernel: megaraid: reseting the host...
Aug 27 16:24:56 brule kernel: megaraid: hw error, cannot reset
Aug 27 16:24:56 brule kernel: megaraid: reseting the host...
Aug 27 16:24:56 brule kernel: megaraid: hw error, cannot reset
Aug 27 16:24:56 brule kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 1 id 0 lun 0