Re: sd_mod or usb-storage fails to read a single good block (was: ehci_hcd fails to read a single good block)

Norman Diamond <n0diamond@xxxxxxxxxxx> · Tue, 3 Apr 2012 16:54:34 +0900 (JST)

Alan Stern wrote:
> On Mon, 2 Apr 2012, Norman Diamond wrote:
>> My USB-to-IDE bridge DOES NOT CRASH when reading a bad block.  When a Linux driver tries to reassign a new USB address and crashes the bridge, the fault is a Linux driver.  (And when Windows forgets about the existence of the drive, the fault is Windows.)
> 
> You really should capture a usbmon trace while running the dd test. That will show exactly what is happening.  See Documentation/usb/usbmon.txt.

Documentation/usb/usbmon.txt says:
"Mount debugfs (it has to be enabled in your kernel configuration), and"

I recommend:
"Mount debugfs (CONFIG_DEBUG_FS has to be enabled in your kernel configuration), and"

The help for CONFIG_DEBUG_FS says (or said; this might be outdated):
"debugfs is a virtual file system that kernel developers use to put debugging files into. Enable this option to be able to read and write to these files.
If unsure, say N."

I recommend:
"debugfs is a virtual file system that kernel developers use to put debugging files into. Enable this option to get reports that you can send to kernel developers if you ever have to report a bug.
If unsure, say Y."

Do you know a live Linux system whose kernel already has it built in?

>>>> There's still something wrong here.  When the bridge is connected to Windows XP, Windows accesses the correct number of blocks.  We need to find someone who has a non-working but indistinguishable bridge, ask them to connect it to Windows XP, and see if Windows XP gets a number of blocks that is too large by 1.
>>> 
>>> How would you tell?
>> 
>> http://hdparm-win32.dyndns.org/hdparm/
>> 
>> D:\hdparm for Windows\binary\hdparm-6.9-20070516.win32\bin>hdparm /dev/sdg
>> 
>> /dev/sdg:
>>  geometry     = 2432/255/63, sectors = 39070080, start = 0
>> 
>> Windows did not get a number of blocks that was too large by 1.  The bridge provided the correct answer for READ CAPACITY, Windows accepted it, and Linux improperly subtracted 1.
> 
> How do you know?  That is, how do you know that the number printed by hdparm above is the value used by Windows and not a value obtained directly from the bridge by the hdparm program itself?

Linux said that it was adjusting 39070080 to 39070079.  So surely the bridge reported the maximum block number as 39070079 (for 39070080 blocks).  And I think we can be pretty sure, practically speaking, that this bridge doesn't detect if the PC host is running Windows instead of Linux, so the bridge surely reported the same maximum block number as 39070079 (for 39070080 blocks) to Windows.

I'm pretty sure that the Windows version of hdparm uses IOCTL_DISK_GET_DRIVE_GEOMETRY, IOCTL_DISK_GET_LENGTH_INFO, and SMART_RCV_DRIVE_DATA to get device information.  IOCTL_IDE_PASS_THROUGH and IOCTL_ATA_PASS_THROUGH surely fail (hdparm -I and hdparm -i fail).  

> Not that I have any reason to doubt this result -- I have no way of knowing whether or not Windows subtracts 1 from the value reported by any USB-(S)ATA bridge.

We should try Windows with a known actually broken bridge  ^_^

> By the way, just out of curiosity, why does it matter so much for your program to know the exact number of blocks a drive contains?

The number of blocks is a multiple of 63 * 255, which was convenient for Windows XP and earlier, and convenient for BIOSes, and convenient for Linux users who needed to interact with Windows or BIOSes.

Here is not my reason, but a fact that deserves to be a reason:
When the drive is mounted internally, Linux can create a partition ending at the end of the entire drive.  When the drive is attached to a USB cable, Linux should be expected to be able to access the existing partition.

Here is another fact that deserves to be a reason:
Windows XP can create a partition ending at the end of the entire drive, even if the drive is attached to a USB cable.  Linux should be expected to be able to access it.

Here is a hint at my reason:
Earlier in this thread I mentioned that I don't get to do forensics myself, but I have to be aware of it.

>>>>> There is no good solution to this problem.
>>>> 
>>>> Agreed.
>> 
>> Wrong.  We CAN try to read the last block that the bridge says exists, to see if it really exists or not.  (Well, only partly wrong.  Maybe there really are some bridges that crash in that situation, but mine doesn't, and a broken driver still needs fixing.)
> 
> Yes, there really are such bridges.  You can find reports from people complaining about them in the email archives.

Wait.  Is there a bridge which overreports the last sector number by 1, and which ALSO crashes when the host PC tries to read that supposed last sector number?  If so then we need a quirk for that doubly broken bridge.  But otherwise, we can try to read the supposedly last sector number and figure out whether we have to subtract 1.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html