akpm@xxxxxxxx wrote:
From: Mark Lord <lkml@xxxxxx> When scsi_get_sense_info_fld() fails (returns 0), it does NOT update the value of first_err_block. But sd_rw_intr() merrily continues to use that variable regardless, possibly making incorrect decisions about retries and the like. This patch removes the randomness there, by using the first sector of the request (SCpnt->request->sector) in such cases, instead of first_err_block. Signed-off-by: Mark Lord <lkml@xxxxxx> Cc: James Bottomley <James.Bottomley@xxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
By the way. I have a more complete fix for the root issue here now, for an older SUSE-9 kernel (prepared for a client). The problem with the current implementation, is that when a libata drive (possibly also a USB2 or Firewire drive) hits a bad sector at block 50 of a 100 block request, SCSI will fail the first 50 blocks of that request, in a very painfully slow fashion (it literally can take *hours* to complete). Correct behaviour is to just fail the actual bad block. One method for doing this is to walk over the failed request, issuing each block one at a time to the LLD, and passing/failing them one at a time. This avoids failing "good" blocks, while giving near-instant recovery from errors. My patch for SUSE-9 does exactly that, with a minimum of fuss. It would be good to see something like that implemented ASAP upstream, and I'm sure that James (or Christoph) could code something like this in their sleep. Or use my patch as a starting point if they're too busy. Cheers Mark Lord - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html