On Fri, Sep 2, 2022 at 4:35 AM Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote: > Your drive seems to be an exception to my (1) statement and the error it > returns seems weird enough that the stat_table ends up being used. > Could you send a dmesg output of a failed command so that we can see the > err_mask etc info for the failed command ? And it would be good to add a > print of the drv_stat and drv_err parameters passed to > ata_to_sense_error() for the failures you are seeing. That would help > trying to figure out what your drive is attempting to signal. I don't think the drive wants to "signal" anything, instead it simply "disappears" at some point. The "original" error is "Emask 0x4 (timeout)". So here's an example from early on when I had not made many kernel changes yet: -----CUT----- ... [ 516.296397] ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 516.296399] ata9.00: failed command: WRITE DMA [ 516.296402] ata9.00: cmd ca/00:23:51:03:4b/00:00:00:00:00/ed tag 4 dma 17920 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 516.296403] ata9.00: status: { DRDY } ... [ 516.761214] ata9: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0x5/21/04 [ 516.761215] ata9.00: device reported invalid CHS sector 0 [ 516.761220] sd 8:0:0:0: [sdk] tag#4 scsi_eh_8: flush finish cmd [ 516.761224] sd 8:0:0:0: [sdk] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 516.761226] sd 8:0:0:0: [sdk] tag#4 Sense Key : Illegal Request [current] [ 516.761228] sd 8:0:0:0: [sdk] tag#4 Add. Sense: Unaligned write command [ 516.761229] sd 8:0:0:0: [sdk] tag#4 CDB: Write(16) 8a 00 00 00 00 00 0d 4b 03 51 00 00 00 23 00 00 ... -----CUT----- That "translated" line is only output because I changed "if (verbose)" to "if (1)" in that kernel. Also note the bizarre "CHS" error which only happens on some of these, not all; I had mentioned before that I am trying to track down how it happens that the LBA bit suddenly disappears (it might have to do with the hardreset being in process at this point and this message racing against the new IDENTIFY?). Using the trace facility I can *sometimes* see the command being issued and then 30 seconds later the timeout happening; sometimes I just get the timeout and I *cannot* find when the command was issued in the trace, another thing that seems bizarre to me. Note that I didn't ask for help with that intentionally, I still think that I am too far away from a proper diagnosis to have a fruitful conversation about where the timeouts originate and why. We've checked against power issues and the like, and again, this happens only when the drive sits behind a SATA controller, not when it's behind a SAS controller. > Also, please send the output of "hdparm -I" for that SSD please, so that > we have information about what standard it is (supposedly) following. See below, but I don't think the specific drive is relevant. The same "problem" shows up with a different brand/model as well, again only in the SATA context, not for SAS. Cheers, Peter -----CUT----- /dev/sde: ATA device, with non-removable media Model Number: WDC WDS400T2B0A-00SM50 Serial Number: 2113CN420743 Firmware Revision: 415020WD Media Serial Num: Media Manufacturer: Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x005e) Supported: 11 10 9 8 7 6 5 Likely used: 11 Configuration: Logical max current cylinders 16383 0 heads 16 0 sectors/track 63 0 -- LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 7814037168 Logical Sector size: 512 bytes Physical Sector size: 512 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 3815447 MBytes device size with M = 1000*1000: 4000787 MBytes (4000 GB) cache/buffer size = unknown Form Factor: 2.5 inch Nominal Media Rotation Rate: Solid State Device Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 1 Current = 1 Advanced power management level: disabled DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE Advanced Power Management feature set * 48-bit Address feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE unknown 119[8] * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Phy event counters * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT DMA Setup Auto-Activate optimization Device-initiated interface power management Asynchronous notification (eg. media change) * Software settings preservation Device Sleep (DEVSLP) * SANITIZE feature set * BLOCK_ERASE_EXT command * DOWNLOAD MICROCODE DMA command * WRITE BUFFER DMA command * READ BUFFER DMA command * Data Set Management TRIM supported (limit 8 blocks) * Deterministic read ZEROs after TRIM Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 2min for SECURITY ERASE UNIT. 2min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 5001b444a70c2c64 NAA : 5 IEEE OUI : 001b44 Unique ID : 4a70c2c64 Device Sleep: DEVSLP Exit Timeout (DETO): 30 ms (drive) Minimum DEVSLP Assertion Time (MDAT): 30 ms (drive) Checksum: correct -----CUT-----