https://bugzilla.kernel.org/show_bug.cgi?id=70751 Bug ID: 70751 Summary: mpt2sas: system disks dropped when execute SMART tests Product: SCSI Drivers Version: 2.5 Kernel Version: 3.8 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Other Assignee: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx Reporter: mihaly.arva-toth+kernelorg@xxxxxxxxxxxxxxxxxxxxxx Regression: No Created attachment 126551 --> https://bugzilla.kernel.org/attachment.cgi?id=126551&action=edit dmesg from boot This bug is similar to #60644 but errors are different. I've a SuperMicro SSG-6047R-E1R36L server with LSI2308 HBA, which handled by mpt2sas kernel driver. I'm using four SATA HDD in server, 2 disks in software RAID-1 with installed Ubuntu 12.04 LTS (3.8.0-29) and 2 disks for standalone Ceph OSD storage. When I run SMART short/extended test on one of first two disk (which holds system), I think driver sends something wrong to controller. I can reproduce every time with smartctl -t short /dev/sda (but I need to do restart after crash) I turn on mpt2sas.debug_logging=0x3f8: 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132677] sd 0:0:1:0: [sdb] CDB: 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132683] Write(10): 2a 08 00 00 08 08 00 00 01 00 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132698] mpt2sas0: sas_address(0x500304800089138d), phy(13) 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132701] mpt2sas0: enclosure_logical_id(0x50030480008913bf), slot(1) 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132704] mpt2sas0: handle(0x000b), ioc_status(success)(0x0000), smid(48) 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132707] mpt2sas0: request_len(512), underflow(512), resid(512) 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132710] mpt2sas0: tag(0), transfer_count(0), sc->result(0x00000002) 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132713] mpt2sas0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132716] mpt2sas0: [sense_key,asc,ascq]: [0x05,0x21,0x00], count(18) 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132730] sd 0:0:1:0: [sdb] 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132733] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132736] sd 0:0:1:0: [sdb] 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132738] Sense Key : Illegal Request [current] 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132743] Info fld=0x808 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132745] sd 0:0:1:0: [sdb] 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132749] Add. Sense: Logical block address out of range 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132753] sd 0:0:1:0: [sdb] CDB: 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132755] Write(10): 2a 08 00 00 08 08 00 00 01 00 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132767] end_request: critical target error, dev sdb, sector 2056 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133132] end_request: critical target error, dev sdb, sector 2056 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133495] md: super_written gets error=-121, uptodate=0 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133500] md/raid1:md0: Disk failure on sdb1, disabling device. 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133500] md/raid1:md0: Operation continuing on 1 devices. 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157908] RAID1 conf printout: 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157913] --- wd:1 rd:2 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157917] disk 0, wo:0, o:1, dev:sda1 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157920] disk 1, wo:1, o:0, dev:sdb1 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.160890] RAID1 conf printout: 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.160903] --- wd:1 rd:2 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.160908] disk 0, wo:0, o:1, dev:sda1 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.175482] EXT4-fs error (device md0): ext4_journal_start_sb:349: Detected aborted journal 2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.175534] EXT4-fs (md0): Remounting filesystem read-only I tried rootfs with ext4 and xfs filesystems too. When I run SMART test on 3rd or 4th HDD (not system disk), there is no crash and tests working fine. When I boot from a live CD, I can run SMART tests on all HDDs without problem. I tried to install and booted latest stable FreeBSD and SMART tests working well, no hang up. I tired the latest LSI firmware P17 and latest mpt2sas kernel driver compiled to this kernel, but problem still exists. Also I tried ASPM disable, PERR and SERR disable and Above 4G encoding enabled but nothing helps. I'm using WD RE3 and RE4 SATA disks. I found an another guy who runs same issue: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/906873/comments/4 So the bug exists in linux kernel only, and crash happens only when I try to run SMART tests on booted system's disks. dmesg from boot has been attached. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html