Re: sd_mod or usb-storage fails to read a single good block (was: ehci_hcd fails to read a single good block)

Matthew Dharm <mdharm-usb@xxxxxxxxxxxxxxxxxx> · Mon, 26 Mar 2012 16:45:08 -0700

Norman --

On Mon, Mar 26, 2012 at 4:30 PM, Norman Diamond <n0diamond@xxxxxxxxxxx> wrote:
> Executive summary:
> It is probably sd_mod which is too aggressive in failing reads, and which needs the same fix that libata and linux-ide received a few years ago.  Now adding the linux-scsi mailing list and quoting my original (faulty) report at the bottom of this message.
>
> Matthew Dharm corrected me:
>> Actually, ehci_hcd has nothing to do with this.  The problem in likely in sd_mod or the scsi core.  Those are the modules that translate your userspace request for a single block into a scsi request, which is then processed by usb-storage and passed to the usb core
>
> OK.  Since libata and linux-ide had been fixed some years ago, and I saw ehci_hcd assigned to the interface I was using yesterday, I blamed the wrong victim.  I understand it's likely to be sd_mod or usb-storage.

usb-storage simply translates the requests it gets from the scsi core
(originating in sd_mod or sr_mod or sg or wherever) into a format the
the USB device can understand.  It has no readahead logic in it at
all, and thus is not at fault here.

>
>> So, the problem is that sd_mod is turning your request for a single block into a request for several blocks.
>
> That's part of the problem.  Readahead is not a bad thing to do.  The problem is that sd_mod or whoever is too aggressive.  Instead of marking buffers for nearby blocks as not having valid data available, it further refuses to supply valid data for the good block and errors out a call that should have succeeded.  libata and linux-ide used to have the same defect before they were fixed.

Likely, the usb-to-ide bridge can't handle this case well.  The drive
is probably reporting a few blocks of data and then an error, but the
bridge device probably can't handle that case.  This is a pretty
common shortcoming of usb-to-ide bridges.

>> As for needing unplug and replug, likely the firmware in your device is crashing when it encounters a bad block. So there is nothing which can be done to recover aside from resetting the device with an unplug/replug cycle.
>
> The disk's firmware correctly reports a read error when reading the bad block and correctly proceeds to obey later commands to read good blocks if so ordered.  This is the same drive that I mounted on a motherboard's IDE connector a few years ago when testing linux-ide and libata.  However, if you blame the usb-to-ide bridge's firmware, I'll try to find a way to test it.

When Alan and I refer to the 'firmware of the device', we both mean
the usb-to-ide bridge chip (in this case).  If your device happened to
be a usb flash drive, there would be no separate "disk" firmware to
confuse the issue.

> Alan Stern also corrected me:
>> This has nothing at all to do with ehci-hcd.  You can prove this (assuming your computer has a UHCI or OHCI controller) by unloading ehci-hcd and running the test again.
>
> I understand that ehci-hcd can be unloaded and reloaded (well, sometimes I can't rmmod some other drivers, but I understand how to try).  I don't see how that would prove anything though.

Your original report blamed ehci-hcd, but that's just another "glue"
layer.  To establish that it is not at fault, you can use a different
(lower-speed) glue layer, in the form of uhci-hcd or ohci-hcd.  And
EHCI controller almost always has a companion UHCI or OHCI controller,
tho on newer chipsets this isn't necessarily true (Intel now ships
chipsets with a rate-matching TT-capable EHCI hub in them, and thus no
"companion" controller).

>> ehci_hcd does not try to reassign anything.  Rather, it is usb-storage which resets the non-working device.
>
> OK.  I have a feeling that usb-storage is overly aggressive in resetting a device and trying to assign a new address.  The drive does not need resetting; I mentioned above that it correctly reports a bad block and correctly continues operating.  Though if the USB-to-IDE bridge is to blame, I'll try to find a way to test it.

Again, we are referring to the bridge firmware. usb-storage will only
attempt to reset the device if the device has failed to respond
properly in a "reasonable" amount of time (as defined in the original
request from the SCSI layer).  That is often 30-seconds or so.

>> If the device were working properly, unplugging and replugging it wouldn't be necessary.  The failure is entirely the device's fault.
>
> I do not believe that.  The drive's report of failure to read a bad block is correct operation by the drive.  Mishandling of a correct error report is the fault of the driver that mishandles the report.

The drive's report of failure is likely getting totally lost by the
usb-to-ide bridge.  Again, this is pretty common among usb-to-ide
bridges.

Matt

-- 
Matthew Dharm
Maintainer, USB Mass Storage driver for Linux
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html