Re: sd_mod or usb-storage fails to read a single good block (was: ehci_hcd fails to read a single good block)

Norman Diamond <n0diamond@xxxxxxxxxxx> · Wed, 28 Mar 2012 13:14:45 +0900 (JST)

I wrote:
> James Bottomley corrected one of the experts who corrected me yesterday:
>>>> So, the problem is that sd_mod is turning your request for a single block into a request for several blocks.
>> 
>> No, it won't be this.  Everything below block does exactly what block says.  If readahead is the problem, then you need to turn it off in block:
>> echo 0 > /sys/block/<dev>/queue/read_ahead_kb
> 
> Thank you.  But...

But it didn't help.  Today I put the disk internally in an old notebook, and libata made it /dev/sda.
echo 0 >/sys/block/sda/queue/read_ahead_kb
but sg_dd still insisted on reading an entire page, so it refused to read a good block that was too close to the bad block.

> James Bottomley corrected me too.
>>> I originally wrote (blaming the wrong component):
>>>>> dd if=/dev/sdb of=/dev/zero bs=512 count=1 skip=551563
>>>>> should succeed because block 551563 has no problem.  But it fails because ehci_hcd insists on reading blocks 551560 through 551567, and block 551562 does have a problem.  (Of course I should have been outputting to /dev/null instead of /dev/zero but that should not matter.)
>>
>> this is not fixable using dd which goes through the page cache (and thus had a minimum read of a page at a time). If you want exact 512 byte sector reads, use sg_dd instead.
> 
> Then writing a 0 to /sys/block/sdb/queue/read_ahead_kb is useless in this particular case, right?  But it might be useful in further testing, because of more oddities.

It was even more useless than expected because sg_dd failed the same way as dd.

It is really really bad that sg_dd could not overcome this.  Some years ago linux-ide and libata were fixed.  When readahead failed, they knew that data were not available for the nearby bad block, but the good block could be read.  I don't know when the fix got unfixed.  This is a show stopper for forensics.  I don't get to do forensics myself, but I have to be aware of it, and the inability to read a good block just because a bad block is next door is not acceptable.

It is really really bad when Windows XP can read it but Linux can't.  (Details below.)

> The USB-to-IDE bridge is vendor 067b, product 2507, Prolific Technology Inc., and usb-storage matches it for quirk 110.

It does appear to be the bridge's fault that the USB cable needed unplugging and replugging.  When the drive is internal and connected to an old Intel ATA controller, the drive did not need unplugging and replugging.

> Digression:  Someone adjusts the number of blocks from the reported 39070080 (which I think is correct) to 39070079 (which I think is wrong).  I'll have to install the drive in an old notebook to check if the reported number of blocks is really correct.

Further on this digression:  The number of blocks really is 39070080.  Today libata made no adjustment and the number of blocks was correct.  So some USB component makes a misadjustment to screw up the number of blocks.

(Quoting details from yesterday, because of the importance to forensics and mentioned above.)
>
> Today I created a FAT12 partition in approximately 8 megabytes surrounding known bad block number 551562.
> 
> Windows XP could write a bunch of files to fill the partition (0 blocks free).  Windows XP could copy all the files from the partition to a directory on an internal SATA drive.  I guess there is some amount of luck that the bad block is not in use in any of the files, despite there being 0 blocks free.
> 
> Linux (cp -pr) could not copy all the files from the partition to a directory on an internal SATA drive.  Even when it should only copy blocks near the bad block, it stubbornly tried to readahead the bad block and it bombed out.  I had to unplug and replug the USB cable to read the drive again.
> 
> Back to Windows XP.  When I told it to run CHKDSK and try to verify every block, then it bombed out.  I had to unplug and replug the USB cable.  Again, just copying the files, they all copied.
> 
> So the need to unplug and replug seems to be the bridge's fault.
> 
> I think readahead is not a bad thing to do, and I can see why it is too late for Linux to recover after getting the error from trying to read ahead.  But I'm sure I've seen Windows XP do readahead too.  How come Windows XP survived where Linux didn't? 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html