On Thu, 18 May 2017, Ewan D. Milne wrote: > On Thu, 2017-05-18 at 11:34 -0400, Ewan D. Milne wrote: > > On Thu, 2017-05-18 at 10:35 -0400, Alan Stern wrote: > > > This is in reference to > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1351305 > > > > > > The problem is that some program (probably udisks2) periodically sends > > > the following ATA pass-through command to a USB-ATA bridge attached to > > > a Western Digital drive: > > > > > > 85062000 00000000 00000000 0000e500 > > > > According to T10 SAT-4 this is an ATA PASS-THROUGH(16) command, with > > PROTOCOL=3 (non-data), CK_COND=1, COMMAND=E5h (the ATA command). > > > > > > > > I don't know what this command does (some sort of reset?). The command > > > fails and the device returns the following sense data: > > > > > > 72000000 0000000e 090c0000 00ff0000 00000000 0050 > > > > > > I don't know how to decode this -- I don't have copies of the relevant > > > documents. Can anybody decode this for me? > > > > This is a current descriptor format sense data wih a single > > ATA Status Return sense data descriptor (beginning at 8 bytes into > > the sense data buffer 090c...) The relevant fields are COUNT=255 LBA=0 > > and STATUS=50h (which I suspect is what is the interesting part). > > > > > > > > Anyway, the SCSI core treats it as a Hardware Error and puts warning > > > messages in the kernel log: > > > > > > [17244.280612] sd 7:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE > > > [17244.280614] sd 7:0:0:0: [sdd] tag#0 Sense Key : Hardware Error [current] [descriptor] > > > [17244.280616] sd 7:0:0:0: [sdd] tag#0 Add. Sense: No additional sense information > > > [17244.280618] sd 7:0:0:0: [sdd] tag#0 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00 > > > > > > Is this really the right thing to do? Could it be that we are failing > > > to interpret this sense data correctly? > > > > With the 72000000 0000000e 090c0000 00ff0000 00000000 0050 sense data > > buffer above scsi_sense_key_string() should have returned "No Sense" as > > the array value is 0. Even if we somehow managed to fail to correctly > > interpret descriptor format sense vs. fixed format sense. > > > > > > > > Other commands provoke similar responses from the device, but without > > > obnoxious log messages. For example, the command: > > > > > > 85082e00 00000100 00000000 0000ec00 > > > > > > fails with the following sense data: > > > > > > 72000000 0000000e 090c0000 00000000 00000000 0050 > > > > > > and no output to the log. I don't know why the behavior is different. > > > There are other similar examples, with and without log messages. > > > > It seems more likely that somehow there is a path where the wrong or > > uninitialized sshdr structure is being passed to the logging routine. > > > > -Ewan > > Oh, wait. You said USB-ATA bridge. > > There is this code in drivers/usb/storage/transport.c: > Perhaps this is responsible. It is forcing the sense key to > HARDWARE_ERROR under certain conditions. That would cause the > logging message you see to be output. > > /* > * We often get empty sense data. This could indicate that > * everything worked or that there was an unspecified > * problem. We have to decide which. > */ > if (sshdr.sense_key == 0 && sshdr.asc == 0 && sshdr.ascq == 0 && > fm_ili == 0) { > /* > * If things are really okay, then let's show that. > * Zero out the sense buffer so the higher layers > * won't realize we did an unsolicited auto-sense. > */ > if (result == USB_STOR_TRANSPORT_GOOD) { > srb->result = SAM_STAT_GOOD; > srb->sense_buffer[0] = 0x0; > > /* > * If there was a problem, report an unspecified > * hardware error to prevent the higher layers from > * entering an infinite retry loop. > */ > } else { > srb->result = DID_ERROR << 16; > if ((sshdr.response_code & 0x72) == 0x72) > srb->sense_buffer[1] = HARDWARE_ERROR; > else > srb->sense_buffer[2] = HARDWARE_ERROR; > } > } I had completely forgotten about this code. :-( Looks like you put your finger on the source of the problem. So if the device sends back essentially empty sense data (SK = No Sense, ASC = ASCQ = 0), but the USB transport indicates command failure, how should we inform the SCSI core in a way that won't cause infinite retries or obnoxious log messages? Should we be doing a better job of detecting empty sense data -- that is, do we need to check for non-empty ATA status? Or has the SCSI core improved so that it no longer does infinite retries (see commit f1a0743bc0e7 "USB: storage: When a device returns no sense data, call it a Hardware Error" and Bugzilla entry #14118), meaning that this code can be removed entirely? Alan Stern