Re: [Bugme-new] [Bug 14020] New: Stack trace when running smartctl on an USB disk

Rogério Brito <rbrito@xxxxxxxxxx> · Sun, 23 Aug 2009 12:11:50 -0300

Hi again, Alan.

(Sorry if this message seems messed up, but I am not using my regular  
mailer right now, unfortunately).

On 2009-08-22, at 21:17, Alan Stern wrote:

On Sat, 22 Aug 2009, Rogério Brito wrote:

The requested trace is attached to this message. Please let me  
know if
you need more information.

The trace shows that something (presumably smartctl) sends a command
the drive doesn't understand.  The drive then violates the USB
mass-storage protocol, sending an invalid response.

Right.

The kernel waits
for a proper response but nothing more happens, so after 30 seconds  
the
command times out and is aborted and the drive is reset.

I'm not with the kernel sources here (so, I can't check the code),  
but is there any option to be able to log such invalid responses when  
the kernel gets one? Perhaps the verbose USB logging does that?

The command
then gets retried, and the same thing happens again.  The retries take
so long that the kernel complains about smartctl being blocked for  
more
than 120 seconds -- that's the reason for the stack dump.

Right.

Geeez, Alan, is there any vendor out there that gets the USB  
implementation according to the specs?

This is the 3rd USB device that I sent you some message about where  
the kernel moans about something that it doesn't understand (I can  
get you the vendor and device ids when I get home).

I will test with some other devices that I have, just to see what  
their response is. :-(

So the problem has several causes.  One is that the drive is buggy (it
doesn't respond with an error code in the proper way when it  
receives a
command it doesn't understand).  Another is that smartctl is trying to
send commands in a form the drive can't handle.

That's probably not smartctl, but the user (me) that is telling it to  
use a given command set to check if the USB adapter understands/ 
allows pass-thru of the SMART protocol to the drive.

Finally, there's the
problem about all the retries taking too long.

Is there anything that could be done about this?

Perhaps you can blame the kernel for spending too much time on  
retries,
but the other two are the fault of the drive and smartctl.

I understand the p-o-v of the kernel: some devices need a little bit  
more time on a retry, while others don't. There's no way to hardcode  
a once and for all behavior. It seems that an expensive solution to  
this would be to create (yet) another list of blacklisted devices  
(how many lists of quirks do we have in the kernel already---this is  
really causing some bloat, especially for some embedded devices). :-(

OTOH, creating blacklists seem to not be the adequate (let alone  
"right") solution (see the ASUS/it87 monitoring cause) in many  
situations. :-/

Thanks for your always kind messages, Rogério Brito.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html