I have a research cluster of around 140 nodes, and have been affected by
this problem since we put them online. The machines are Tyan boards with
dual E54x0 CPUs and onboard SAS, with four SATA drives attached.
The half of the cluster with very high disk usage displayed this issue
on perhaps one machine every two days, while the other half only had
problems when SMART requests were issued. The bus would reset, and a
drive would be logically ejected and reinserted (but at a different
place, like /dev/sde).
Regardless of mptscsih.c being the correct place to enforce alignment,
applying the patch Ryan Kuester provided to the kernel (2.6.32) running
on the cluster has 1) stopped future occurrences of this problem, 2)
made it immune against problems from running Ryan's bomb program and 3)
remaining drive problems only occurred on unpatched nodes.
These messages still appear regularly though:-
[702162.202899] sd 4:0:3:0: [sdd] Sense Key : Recovered Error [current]
[descriptor]
[702162.293329] Descriptor sense data with sense descriptors (in hex):
[702162.368629] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
[702162.447985] 00 4f 00 c2 40 50
[702162.494805] sd 4:0:3:0: [sdd] Add. Sense: ATA pass through
information available
I haven't seen other messages yet from mptsas users that Ryan's patch
works, so I provide my experience.
Greetings,
Michael.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html