Re: sd_mod or usb-storage fails to read a single good block (was: ehci_hcd fails to read a single good block)

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Wed, 4 Apr 2012 11:48:31 -0400 (EDT)

On Wed, 4 Apr 2012, Norman Diamond wrote:

> Alan Stern wrote:
> > On Tue, 3 Apr 2012, Norman Diamond wrote:
> >>>> Wrong.  We CAN try to read the last block that the bridge says exists, to see if it really exists or not.  (Well, only partly wrong.  Maybe there really are some bridges that crash in that situation, but mine doesn't, and a broken driver still needs fixing.)
> >>> 
> >>> Yes, there really are such bridges.  You can find reports from people complaining about them in the email archives.
> >> 
> >> Wait.� Is there a bridge which overreports the last sector number by 1, and which ALSO crashes when the host PC tries to read that supposed last sector number?
> > 
> > Yes.� Like I said before, you can find complaints about these things in the mailing list archives.� However it's not clear whether the crashing is the fault of the bridge itself or the drive it is attached to.
> 
> Or the driver.

No, it is not the fault of the driver.

> I think it is not the fault of the drive.  If a drive crashed when a driver tried to access a nonexistent sector, then the drive would crash when mounted internally on an ATA or SATA cable not only when connected to a USB cable.
> 
> It might be the fault of the bridge, BUT, does a single bridge really suffer from BOTH faults as I asked?  You seem to be saying yes, but then you cloud it up by saying it's unclear if it might be the drive instead of the bridge.  I still wonder if it's the driver instead of the bridge, but surely not the drive.

I wanted to hedge because all the computer sees is what it gets from 
the bridge.  It has no way to tell whether a problem was caused by the 
bridge or the drive.  If you want to exonerate the drive, that's fine 
with me.  Certainly the bridge has enough other problems that we 
shouldn't be surprised to find it was solely responsible.

And I repeat, the driver did everything exactly as it should have.

> >>� If so then we need a quirk for that doubly broken bridge.
> > 
> > That's what the existing quirk entries are there for.
> 
> Surely that's not what the quirk is for on my bridge.

Yes, it is.  Or rather, it probably is -- to be absolutely certain I'd 
have to look through the email archives to find the bug reports that 
caused the quirk entry to be added.

However there's no question that other bridges having the same vendor,
product, and revision ID values as yours _did_ report the drive
capacity incorrectly; otherwise the quirk entry would not be there.  
What I'm not sure of is whether those bridges went on to crash as
dramatically as yours.

> >> But otherwise, we can try to read the supposedly last sector number and figure out whether we have to subtract 1.
> > 
> > True enough, but we don't have any way to know which bridges do crash and which don't other than trying it.� And you'll probably agree that trial and error is not such a good idea in the cases where the bridge does crash.
> 
> This still makes me wonder why Windows is able to handle the same bridges.

Probably because Windows does not try to read the last block, unless it
is occupied by a file.  If it did, it probably would crash the bridges
too.

It would be interesting to know the details of what your Windows system
does when trying to read a file that occupies the faulty sector.  
There are programs around that are roughly equivalent to usbmon for
Windows; you could try one of them.

> Now, as far as I can tell, I have found a WTF of a standard.  I only have drafts of old T10 documents but it sure looks like a WTF unless something was fixed later.
> 
> In SCSI Block Commands, it seems to me that the READ CAPACITY (10) command (opcode 0x25) returns the number of blocks.

I'm not sure how you reached that conclusion.  In my rather old copy of
the SCSI-2 standard, section 9.2.7 (the READ CAPACITY command in the
chapter on Direct-access devices) says:

	A partial medium indicator (PMI) bit of zero indicates that the 
	returned logical block address and the block length in bytes
	are those of the last logical block on the logical unit.

And of course, the PMI bit _is_ zero for the commands we are talking
about.

> In Reduced Block Commands, it says explicitly that the READ CAPACITY command (opcode 0x25) returns the LBA of the last logical block of the media contained in the device.  So a USB-to-ATA bridge is required to subtract 1 from the number of blocks reported by ATA IDENTIFY, and a USB-to-SCSI bridge has to subtract 1 from the result of READ CAPACITY (10).  (Our present discussion involves the fact that a defective bridge might not subtract that 1, whereupon the next step gets spindled and mutilated.)
> 
> If a driver wants the last block number then it has to subtract 1 in the case of SCSI, and it wants the number of blocks then it has to add 1 in the case of USB.  Did I really read this right?

No.  The READ CAPACITY command should always return the last block
number for any form of SCSI, including SCSI over USB, and to get the
number of blocks that value should be incremented.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html