Re: usb hdd problems with 2.6.27.2

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 27 Oct 2008 11:25:19 -0400 (EDT)

On Mon, 27 Oct 2008, Douglas Gilbert wrote:

> > This looks exactly like the "infinite retry" problem I warned about 
> > earlier.  Here are the important parts of the log.  For people who 
> > don't know how to interpret these messages, the CDB starts in the 16th 
> > byte of the 31-byte messages.  For example, the first command here 
> > starts with 0x25 and so it is READ CAPACITY:
> > 
> >> f21e7cc0 3570408174 S Bo:1:008:1 -115 31 = 55534243 06000000 08000000 80000a25 00000000 00000000 00000000 000000
> >> f21e7cc0 3570408264 C Bo:1:008:1 0 31 >
> >> f21e72c0 3570408280 S Bi:1:008:2 -115 8 <
> >> f21e72c0 3570408389 C Bi:1:008:2 0 8 = 2e9390b0 00000200
> >> f21e7cc0 3570408400 S Bi:1:008:2 -115 13 <
> >> f21e7cc0 3570408513 C Bi:1:008:2 0 13 = 55534253 06000000 00000000 00
> > 
> > The response is 0x2e9390b0.  In typical broken fashion, that is 
> > undoubtedly the total number of sectors rather than the highest sector 
> > number.
> 
> Since the READ CAPACITY "off by one" error is so common,
> perhaps drivers such as usb-storage could have a hook to
> do a pseudo READ CAPACITY. Then if the capacity value
> looked odd (in both senses) the driver could do an IO to
> the suspect block and if that failed decrement the capacity
> value passed back to the mid level.

We thought of that years ago.  Unfortunately there is no reliable way
of telling when a capacity value is wrong.  There definitely do exist
disks with an odd number of sectors.

Furthermore, doing I/O to a suspect block is not a good idea.  There
are plenty of devices which simply crash when you try to access a
nonexistent sector.

> Put another way, why don't these defective devices trip up
> another OS?

I imagine they do.  However Linux has partition code that stores
information in the last sector of a partition (EFI GUID and md, for
example).  Other OS's apparently do not try to access the medium's last
sector under most circumstances.

> BTW a single disk in RAID 0 (seen on a HP E200 controller)
> has a shortened capacity value seen in the midlevel on the
> corresponding logical drive. That missing chunk is probably
> where the RAID controller puts its control information.
> Anyway, playing with the capacity value returned by READ
> CAPACITY certainly has a precedent.

usb-storage isn't in the business of altering the data it gets from a 
device.  It's just a transport.  That's why the sdev->fix_capacity flag 
exists; we tell the upper layer that the data it gets is going to be 
wrong and let the upper layer worry about fixing things up.

> > Later on the system tries to read the contents of what it thinks is the 
> > last sector:
> 
> I know that happens but it seems strange that upper levels
> are reading a block that has never been written to. Read ahead?

No, partition scanning.  Also maybe /lib/udev/vol_id, which seems to 
read an inordinate number of irrelevant sectors.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html