On Mon, 10 Nov 2008, Brian Kysela wrote: > I tested with 2.6.27.5 and found that, although the process would hang as often > as usual, it always recovered instead of needing to reboot. No kernel bug or > system freeze, no climbing load avg, etc. Here is the usbmon output on a > failed copy: > > http://www.kysela.org/pub/4.mon.out > > The syslog: > > [ 1003.736201] sd 6:0:0:0: [sdb] Assuming drive cache: write through > [ 1003.738949] sd 6:0:0:0: [sdb] Assuming drive cache: write through > [ 1003.741886] sdb1 > [ 1112.311917] end_request: I/O error, dev sdb, sector 667600 > [ 1112.311956] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.312038] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.312074] end_request: I/O error, dev sdb, sector 667840 > [ 1112.312121] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.312131] Buffer I/O error on device sdb1, logical block 512 > [ 1112.312137] lost page write due to I/O error on sdb1 > [ 1112.312159] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.312168] Buffer I/O error on device sdb1, logical block 576 > [ 1112.312172] lost page write due to I/O error on sdb1 > [ 1112.312181] Buffer I/O error on device sdb1, logical block 577 > [ 1112.312185] lost page write due to I/O error on sdb1 > [ 1112.312193] Buffer I/O error on device sdb1, logical block 578 > [ 1112.312198] lost page write due to I/O error on sdb1 > [ 1112.312211] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.312234] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.312247] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.379235] sd 6:0:0:0: rejecting I/O to offline device > [ 1112.379640] FAT: unable to read inode block for updating (i_pos 9253) This is essentially the same failure mechanism as before, but without the timeout-related kernel bug. There is a communications error during one of the reads. It takes the same form in both logs: A transfer receives only 4051 bytes when it should get 4096. Don't ask me why that happens; it's some sort of hardware or firmware failure either in the drive or in your USB host controller. The kernel tries to recover, but it looks as though the drive is stuck trying to send the remaining bytes. Resets don't help, so the drive is taken off-line. You _might_ be able to prevent these problems by reducing the drive's max_sectors value, say to 128. See http://www.linux-usb.org/FAQ.html#i5 No guarantees, though. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html