Re: [smartmontools-support] Apparent MPT ata pass-through bug SAS1068 and SAS1068E - WAS SMART causes disks to go offline on an LSI SAS1068 controller - Dell SAS 5/iR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tim Small wrote:
> ... I will impose a bit of extra IO load on the machine to see if that
> provokes more errors.
>   


The answer would seem to be yes - whilst simultaneously running these
two commands:

while true ; do dd if=/dev/zero of=empty count=1M ; sync ; rm empty ;
sync ; done

and:

while true ; do smartctl -a /dev/sg1 > /dev/null || echo failed && echo
-n . ; done

... about 10% of the smartctl commands fail, and this sort of thing gets
logged:

[61729.829710] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61729.833705] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61730.019141] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61741.334274] mptbase: ioc0: LogInfo(0x31130000): Originator={PL},
Code={IO Not Yet Executed}, SubCode(0x0000)
[61741.353972] mptscsih: ioc0: attempting task abort! (sc=ffff880037b6c880)
[61741.367368] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61741.379314] mptscsih: ioc0: task abort: FAILED (sc=ffff880037b6c880)
[61741.392017] mptscsih: ioc0: attempting target reset!
(sc=ffff880037b6c880)
[61741.405757] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61741.417702] mptscsih: ioc0: target reset: FAILED (sc=ffff880037b6c880)
[61741.430752] mptscsih: ioc0: attempting bus reset! (sc=ffff880037b6c880)
[61741.443970] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61745.830347] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff880037b6c880)
[61757.329906] mptbase: ioc0: LogInfo(0x31140000): Originator={PL},
Code={IO Executed}, SubCode(0x0000)
[61757.348194] mptscsih: ioc0: attempting host reset! (sc=ffff880037b6c880)
[61757.361592] mptbase: ioc0: Initiating recovery
[61779.120762] mptscsih: ioc0: host reset: SUCCESS (sc=ffff880037b6c880)

[61795.240058] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61795.244054] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61806.744084] mptbase: ioc0: LogInfo(0x31130000): Originator={PL},
Code={IO Not Yet Executed}, SubCode(0x0000)
[61806.763772] mptscsih: ioc0: attempting task abort! (sc=ffff880037b6c380)
[61806.777179] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61806.789127] mptscsih: ioc0: task abort: FAILED (sc=ffff880037b6c380)
[61806.801833] mptscsih: ioc0: attempting target reset!
(sc=ffff880037b6c380)
[61806.815575] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61806.827520] mptscsih: ioc0: target reset: FAILED (sc=ffff880037b6c380)
[61806.840575] mptscsih: ioc0: attempting bus reset! (sc=ffff880037b6c380)
[61806.853797] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61811.240162] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff880037b6c380)
[61822.739995] mptbase: ioc0: LogInfo(0x31140000): Originator={PL},
Code={IO Executed}, SubCode(0x0000)
[61822.758297] mptscsih: ioc0: attempting host reset! (sc=ffff880037b6c380)
[61822.771694] mptbase: ioc0: Initiating recovery
[61844.528012] mptscsih: ioc0: host reset: SUCCESS (sc=ffff880037b6c380)

[61865.400161] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61865.404157] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61865.404157] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61865.404157] mptbase: ioc0: LogInfo(0x31110d00): Originator={PL},
Code={Reset}, SubCode(0x0d00)
[61876.904450] mptbase: ioc0: LogInfo(0x31130000): Originator={PL},
Code={IO Not Yet Executed}, SubCode(0x0000)
[61876.924174] mptscsih: ioc0: attempting task abort! (sc=ffff8800c0218d80)
[61876.937577] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61876.949527] mptscsih: ioc0: task abort: FAILED (sc=ffff8800c0218d80)
[61876.962233] mptscsih: ioc0: attempting target reset!
(sc=ffff8800c0218d80)
[61876.975974] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61876.987918] mptscsih: ioc0: target reset: FAILED (sc=ffff8800c0218d80)
[61877.000971] mptscsih: ioc0: attempting bus reset! (sc=ffff8800c0218d80)
[61877.014193] scsi 2:0:0:0: [sg1] CDB: Inquiry: 12 00 00 00 24 00
[61881.400528] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff8800c0218d80)
[61892.900633] mptbase: ioc0: LogInfo(0x31140000): Originator={PL},
Code={IO Executed}, SubCode(0x0000)
[61892.918924] mptscsih: ioc0: attempting host reset! (sc=ffff8800c0218d80)
[61892.932322] mptbase: ioc0: Initiating recovery
[61914.688765] mptscsih: ioc0: host reset: SUCCESS (sc=ffff8800c0218d80)
[61924.300535] INFO: task sync:15809 blocked for more than 120 seconds.
[61924.313245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[61924.328907] sync          D 0000000000000000     0 15809   9780
0x00000000
[61924.342681]  ffffffff814ee8b0 0000000000000082 0000000000000000
000000005fb8f9b9
[61924.357538]  000000005fb8f9b9 0000000000000000 00000000000108a0
ffff8800379bdfd8
[61924.372387]  0000000000015980 0000000000015980 ffff88012e4ab040
ffff88012e4ab338
[61924.387241] Call Trace:
[61924.392145]  [<ffffffffa01afcf5>] ? log_wait_commit+0xcf/0x137 [jbd]
[61924.404848]  [<ffffffff8107cc8a>] ? autoremove_wake_function+0x0/0x59
[61924.417725]  [<ffffffffa01c9c8c>] ? ext3_sync_fs+0x52/0x70 [ext3]
[61924.429906]  [<ffffffff8116ae4d>] ? sync_quota_sb+0x59/0x133
[61924.441222]  [<ffffffff81141bbc>] ? __sync_filesystem+0x5f/0xab
[61924.453057]  [<ffffffff81141cb6>] ? sync_filesystems+0xae/0x110
[61924.464893]  [<ffffffff81141d9a>] ? sys_sync+0x2c/0x56
[61924.475169]  [<ffffffff81010e02>] ? system_call_fastpath+0x16/0x1b


... so I'm assuming that the same race occurs with ATA pass-through
commands, but error recovery is better with 2.6.32-rc4 + mptsas 3.04.13


Cheers,

Tim.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux