Re: aacraid: SCSI bus appears hung

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2009-03-20 at 14:31 +0000, Thomas Mueller wrote:
> hi 
> 
> this is on debian etch with kernel 2.6.26 (backports.org) and aacraid 
> 1.1-5[2456]-ms. the adapter is an adaptec 5805 (rebranded as Supermicro 
> AOC-USAS-S8iR, f/w 15758), 4+1 WD VelociRaptor 300GB disks, RAID10.
> 
> the disks aren't very good. about every 2 months the background consistency
> check detects defectiv blocks on some disks. the hotspare disk takes
>  over. that's where the  troubles start.
> 
> Mar 19 20:44:30 ib001 kernel: [4312641.290691] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:44:30 ib001 kernel: [4312641.290792] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4312700.999164] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4312880.704289] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4312880.704388] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4312941.412927] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4312941.413039] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4312951.930474] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
> Mar 19 20:57:53 ib001 kernel: [4313001.400935] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313001.401042] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4313061.796830] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313061.796930] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4313122.675845] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313122.675931] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4313183.252118] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313183.252227] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4313239.408236] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313239.408337] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4313295.503066] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313295.503145] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4313305.669682] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
> Mar 19 20:57:53 ib001 kernel: [4313351.860988] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313351.861020] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313351.861047] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313351.861073] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313351.861100] aacraid: Host adapter abort request (0,0,0,0)
> Mar 19 20:57:53 ib001 kernel: [4313351.861191] aacraid: Host adapter reset request. SCSI hang ?
> Mar 19 20:57:53 ib001 kernel: [4313413.717370] aacraid: SCSI bus appears hung
> Mar 19 20:58:09 ib001 kernel: [4313517.692627] sd 0:0:0:0: [sda] 585084928 512-byte hardware sectors (299563 MB)
> Mar 19 20:58:09 ib001 kernel: [4313517.692627] sd 0:0:0:0: [sda] Write Protect is off
> Mar 19 20:58:09 ib001 kernel: [4313517.692627] sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
> Mar 19 20:58:09 ib001 kernel: [4313517.692627] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
> Mar 19 21:46:34 ib001 kernel: [4317148.271355] sd 0:0:0:0: [sda] 585084928 512-byte hardware sectors (299563 MB)
> Mar 19 21:46:34 ib001 kernel: [4317148.271355] sd 0:0:0:0: [sda] Write Protect is off
> Mar 19 21:46:34 ib001 kernel: [4317148.271355] sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
> Mar 19 21:46:34 ib001 kernel: [4317148.271355] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
> (many "process hung" kernel warnings suppressed)
> 
> the aacraid seems to be unresponsive after this event. blocking the system.
> on top of the aacraid device there is drbd running. which 
> also gets mad about aacraid not responding - and then 
> the second drbd node (identical machine) also gets stuck.
> 
> sometimes this is only "resolveable" by rebooting the host. 
> 
> same problem on 2 other servers with nearly identical hardware.
> 
> is this expected on an disk failure event?
> 
> maybe i should try the vanilla 2.6.28.x kernel? 

Part of the problem seems to be the way the aacraid firmware is reacting
to disk failures.  It's possible it might recovery faster with a newer
kernel (I seem to remember seeing "hit it with a bigger hammer" type
patches going into that).  However, your basic problem of running RAID
on unreliable disks will still remain.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux