Re: [smartmontools-support] SMART causes disks to go offline on an LSI SAS1068 controller - Dell SAS 5/iR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Just to say that I'm seeing this bug as well, with smartmontools 5.38 and smartctl 5.39 2009-10-10 r2955 on Debian lenny. The machine is a Dell PowerEdge 860. I'm guessing that this is either a firmware or driver issue.

02:08.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
       Subsystem: Dell SAS 5/iR Adapter RAID Controller
       Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 1275
       I/O ports at ec00 [disabled] [size=256]
       Memory at fe9fc000 (64-bit, non-prefetchable) [size=16K]
       Memory at fe9e0000 (64-bit, non-prefetchable) [size=64K]
       Expansion ROM at fea00000 [disabled] [size=1M]
       Capabilities: [50] Power Management version 2
Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
       Capabilities: [68] PCI-X non-bridge device
       Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1
       Kernel driver in use: mptsas
       Kernel modules: mptsas

# modinfo mptsas
filename: /lib/modules/2.6.26-2-openvz-amd64/kernel/drivers/message/fusion/mptsas.ko
version:        3.04.06
license:        GPL
description:    Fusion MPT SAS Host driver
author:         LSI Corporation



The errors look like this:

428.524463] mptscsih: ioc0: attempting task abort! (sc=ffff81021b950940)
428.524471] sd 0:0:0:0: [sda] CDB: ATA command pass through(16): 85 08 0e 00 d5 00 01 00 09 00 4f 00 c2 00 b0 00 433.199851] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
433.199851] mptsas: ioc0: removing sata device, channel 0, id 0, phy 0
433.199851]  port-0:0: mptsas: ioc0: delete port (0)
433.199851] sd 0:0:0:0: [sda] Synchronizing SCSI cache
433.348856] mptscsih: ioc0: task abort: SUCCESS (sc=ffff81021b950940)
433.348868] mptscsih: ioc0: attempting task abort! (sc=ffff81021b950440)
433.348873] sd 0:0:0:0: [sda] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
433.348885] mptscsih: ioc0: task abort: SUCCESS (sc=ffff81021b950440)
433.348893] mptscsih: ioc0: attempting target reset! (sc=ffff81021b950940)
433.348896] sd 0:0:0:0: [sda] CDB: ATA command pass through(16): 85 08 0e 00 d5 00 01 00 09 00 4f 00 c2 00 b0 00
433.605026] mptscsih: ioc0: target reset: SUCCESS (sc=ffff81021b950940)
433.605034] mptscsih: ioc0: attempting bus reset! (sc=ffff81021b950940)
433.605037] sd 0:0:0:0: [sda] CDB: ATA command pass through(16): 85 08 0e 00 d5 00 01 00 09 00 4f 00 c2 00 b0 00
434.157594] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81021b950940)
444.546154] mptscsih: ioc0: attempting host reset! (sc=ffff81021b950940)
444.546162] mptbase: ioc0: Initiating recovery
461.540429] mptscsih: ioc0: host reset: SUCCESS (sc=ffff81021b950940)
461.540437] sd 0:0:0:0: Device offlined - not ready after error recovery
461.540440] sd 0:0:0:0: Device offlined - not ready after error recovery
461.540475] end_request: I/O error, dev sda, sector 15631039
461.540480] md: super_written gets error=-5, uptodate=0
461.540485] raid1: Disk failure on sda1, disabling device.



and the drives are:

Model Family:     Seagate Barracuda ES
Device Model:     ST3250620NS
Serial Number:    9QE3L9E0
Firmware Version: 3BKS

and are in JBOD mode (+ sw RAID with md).

lsiutil says:

Current active firmware version is 0.10.51
Firmware image's version is MPTFW-00.10.51.00-IE
 LSI Logic
x86 BIOS image's version is MPTBIOS-6.12.05.00 (2007.09.29)

... which is the latest on Dell's download pages for this server.

The kernel is 2.6.26-2-openvz-amd64 from Debian Lenny (same behaviour with non-openvz kernel). Running smartd makes the drives disappear after a few hours, but doing this:

while true ; do smartctl -T permissive -d sat -a /dev/sda > /dev/null && echo -n . ; done

seems to knock them out in about a minute.

Subjectively, 5.38 seemed to upset the controller a lot quicker than 5.39 r2955 does. For good measure I'm currently stress-testing a PE1950 with a SAS 6/iR (SAS1068E) in the same way (however this is using RAID setup through the BIOS).

smartctl 5.39-pre needs '-T permissive' on the PE860, but 5.38 doesn't seem to require it.


It is worth trying a newer mptsas driver?

Regards,

Tim.

--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux