[Bug 70751] New: mpt2sas: system disks dropped when execute SMART tests

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Tue, 18 Feb 2014 10:48:27 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=70751

            Bug ID: 70751
           Summary: mpt2sas: system disks dropped when execute SMART tests
           Product: SCSI Drivers
           Version: 2.5
    Kernel Version: 3.8
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Other
          Assignee: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx
          Reporter: mihaly.arva-toth+kernelorg@xxxxxxxxxxxxxxxxxxxxxx
        Regression: No

Created attachment 126551
  --> https://bugzilla.kernel.org/attachment.cgi?id=126551&action=edit
dmesg from boot

This bug is similar to #60644 but errors are different.

I've a SuperMicro SSG-6047R-E1R36L server with LSI2308 HBA, which handled by
mpt2sas kernel driver. I'm using four SATA HDD in server, 2 disks in software
RAID-1 with installed Ubuntu 12.04 LTS (3.8.0-29) and 2 disks for standalone
Ceph OSD storage.

When I run SMART short/extended test on one of first two disk (which holds
system), I think driver sends something wrong to controller. I
can reproduce every time with smartctl -t short /dev/sda (but I need to do
restart after crash)

I turn on mpt2sas.debug_logging=0x3f8:

2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132677] sd 0:0:1:0: [sdb] CDB: 
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132683] Write(10): 2a 08 00 00
08 08 00 00 01 00
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132698] mpt2sas0:     
sas_address(0x500304800089138d), phy(13)
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132701] mpt2sas0:     
enclosure_logical_id(0x50030480008913bf), slot(1)
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132704] mpt2sas0:     
handle(0x000b), ioc_status(success)(0x0000), smid(48)
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132707] mpt2sas0:     
request_len(512), underflow(512), resid(512)
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132710] mpt2sas0:      tag(0),
transfer_count(0), sc->result(0x00000002)
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132713] mpt2sas0:     
scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132716] mpt2sas0:     
[sense_key,asc,ascq]: [0x05,0x21,0x00], count(18)
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132730] sd 0:0:1:0: [sdb]  
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132733] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132736] sd 0:0:1:0: [sdb]  
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132738] Sense Key : Illegal
Request [current] 
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132743] Info fld=0x808
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132745] sd 0:0:1:0: [sdb]  
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132749] Add. Sense: Logical
block address out of range
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132753] sd 0:0:1:0: [sdb] CDB: 
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132755] Write(10): 2a 08 00 00
08 08 00 00 01 00
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.132767] end_request: critical
target error, dev sdb, sector 2056
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133132] end_request: critical
target error, dev sdb, sector 2056
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133495] md: super_written gets
error=-121, uptodate=0
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133500] md/raid1:md0: Disk
failure on sdb1, disabling device.
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.133500] md/raid1:md0:
Operation continuing on 1 devices.
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157908] RAID1 conf printout:
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157913]  --- wd:1 rd:2
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157917]  disk 0, wo:0, o:1,
dev:sda1
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.157920]  disk 1, wo:1, o:0,
dev:sdb1
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.160890] RAID1 conf printout:
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.160903]  --- wd:1 rd:2
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.160908]  disk 0, wo:0, o:1,
dev:sda1
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.175482] EXT4-fs error (device
md0): ext4_journal_start_sb:349: Detected aborted journal
2014-02-18T10:50:28+01:00 stor3 kernel: : [ 1103.175534] EXT4-fs (md0):
Remounting filesystem read-only

I tried rootfs with ext4 and xfs filesystems too. When I run SMART test on 3rd
or 4th HDD (not system disk), there is no crash and tests
working fine. When I boot from a live CD, I can run SMART tests on all HDDs
without problem. I tried to install and booted latest stable
FreeBSD and SMART tests working well, no hang up.

I tired the latest LSI firmware P17 and latest mpt2sas kernel driver compiled
to this kernel, but problem still exists. Also I tried ASPM disable, PERR and
SERR disable and Above 4G encoding enabled but nothing helps. I'm using WD RE3
and RE4 SATA disks.

I found an another guy who runs same issue:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/906873/comments/4

So the bug exists in linux kernel only, and crash happens only when I try to
run SMART tests on booted system's disks.

dmesg from boot has been attached.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html