Tejun, On 2019/06/25 5:57, Tejun Heo wrote: > Hello, Damien. > > On Mon, Jun 24, 2019 at 08:27:02PM +0000, Damien Le Moal wrote: >> For NCQ commands, I believe it is mandatory to request sense data for the failed >> command to get the device out of error mode. So isn't this approach breaking > > Hah, that's a news to me. We never had that code path before ZAC > support was added, so I'm kinda skeptical that'd be the case. I checked again the ACS specs, and your are right, REQUEST SENSE DATA EXT is optional in general, dependent on support of the Sense Data Reporting feature set. For NCQ command errors, from ACS: "If an error occurs while the device is processing an NCQ command, then the device shall return command aborted for all NCQ commands that are in the queue and shall return command aborted for any subsequent commands, except a command from the GPL feature set (see 4.10) that reads the NCQ Command Error log (see 9.13), until the device completes that command without error." So as long as NCQ command error log page is read, the device queue will get out of error mode and new commands can be issued. There is no need for REQUEST SENSE DATA EXT. I got confused with the fact that the Sense data reporting feature is mandatory with ZAC drives (that is defined in ZAC, not ACS). >> anything for well behaving drives ? Wouldn't it be better to blacklist the >> misbehaving SSD you observed the problem with ? > > Provided I'm not wrong with the assumption, there's virtually no > benefit in doing this and that's gonna be a *really* difficult > blacklist to develop. You are not wrong :) Will test your patch on our test rig which generates (in purpose) a lot of command failures on ZAC drives. We can also give it a run with generated errors on regular disks. Cheers. > > Thanks. > -- Damien Le Moal Western Digital Research