Hi again, Alan.
(Sorry if this message seems messed up, but I am not using my regular
mailer right now, unfortunately).
On 2009-08-22, at 21:17, Alan Stern wrote:
On Sat, 22 Aug 2009, Rogério Brito wrote:
The requested trace is attached to this message. Please let me
know if
you need more information.
The trace shows that something (presumably smartctl) sends a command
the drive doesn't understand. The drive then violates the USB
mass-storage protocol, sending an invalid response.
Right.
The kernel waits
for a proper response but nothing more happens, so after 30 seconds
the
command times out and is aborted and the drive is reset.
I'm not with the kernel sources here (so, I can't check the code),
but is there any option to be able to log such invalid responses when
the kernel gets one? Perhaps the verbose USB logging does that?
The command
then gets retried, and the same thing happens again. The retries take
so long that the kernel complains about smartctl being blocked for
more
than 120 seconds -- that's the reason for the stack dump.
Right.
Geeez, Alan, is there any vendor out there that gets the USB
implementation according to the specs?
This is the 3rd USB device that I sent you some message about where
the kernel moans about something that it doesn't understand (I can
get you the vendor and device ids when I get home).
I will test with some other devices that I have, just to see what
their response is. :-(
So the problem has several causes. One is that the drive is buggy (it
doesn't respond with an error code in the proper way when it
receives a
command it doesn't understand). Another is that smartctl is trying to
send commands in a form the drive can't handle.
That's probably not smartctl, but the user (me) that is telling it to
use a given command set to check if the USB adapter understands/
allows pass-thru of the SMART protocol to the drive.
Finally, there's the
problem about all the retries taking too long.
Is there anything that could be done about this?
Perhaps you can blame the kernel for spending too much time on
retries,
but the other two are the fault of the drive and smartctl.
I understand the p-o-v of the kernel: some devices need a little bit
more time on a retry, while others don't. There's no way to hardcode
a once and for all behavior. It seems that an expensive solution to
this would be to create (yet) another list of blacklisted devices
(how many lists of quirks do we have in the kernel already---this is
really causing some bloat, especially for some embedded devices). :-(
OTOH, creating blacklists seem to not be the adequate (let alone
"right") solution (see the ASUS/it87 monitoring cause) in many
situations. :-/
Thanks for your always kind messages, Rogério Brito.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html