On Mon, 2022-08-29 at 08:04 +0200, Peter Fröhlich wrote: > On Mon, Aug 29, 2022 at 1:20 AM Damien Le Moal > <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote: > > On 2022/08/26 21:00, Peter Fröhlich wrote: > > > In commit 8ae720449fca4b1d0294c0a0204c0c45556a3e61 "libata: > > > whitespace > > > fixes in ata_to_sense_error()" we find, among many actual > > > whitespace > > > changes, another change that adds an entry for 0x40 to the > > > stat_table. > > > ... > > > > See the code below that table definition. You can see this hunk: > > > > if (drv_err) { > > /* Look for drv_err */ > > for (i = 0; sense_table[i][0] != 0xFF; i++) { > > /* Look for best matches first */ > > if ((sense_table[i][0] & drv_err) == > > sense_table[i][0]) { > > *sk = sense_table[i][1]; > > *asc = sense_table[i][2]; > > *ascq = sense_table[i][3]; > > goto translate_done; > > } > > } > > } > > That's the sense_table, I was referring to the stat_table. That table > is consulted when we fail to convert via the sense_table. Right... Reading emails on Monday morning without enough coffee was a bad idea :) So looking at the right code again, this is all very strange. E.g. the ACS specs define bit 5 of the status field as the "device fault" bit, but the code looks at 0x08, so bit 3. For write command, the definition is: STATUS Bit Description 7:6 Transport Dependent – See 6.2.11 5 DEVICE FAULT bit – See 6.2.6 4 N/A 3 Transport Dependent – See 6.2.11 2 N/A 1 SENSE DATA AVAILABLE bit – See 6.2.9 0 ERROR bit – See 6.2.8 And the code is: static const unsigned char stat_table[][4] = { /* Must be first because BUSY means no other bits valid */ {0x80, ABORTED_COMMAND, 0x47, 0x00}, // Busy, fake parity for now {0x40, ILLEGAL_REQUEST, 0x21, 0x04}, // Device ready, unaligned write command {0x20, HARDWARE_ERROR, 0x44, 0x00}, // Device fault, internal target failure {0x08, ABORTED_COMMAND, 0x47, 0x00}, // Timed out in xfer, fake parity for now {0x04, RECOVERED_ERROR, 0x11, 0x00}, // Recovered ECC error Medium error, recovered {0xFF, 0xFF, 0xFF, 0xFF}, // END mark }; So this does not match at all. Something wrong here, or, the "status" field being observed here is not the one I am thinking of. Checking AHCI & SATA-IO specs, I do not see anything matching there either. Hu... Need to dig into this further. Missing something here. > > > Are you using an SMR drive ? Getting an aligned write should not > > happen for a > > regular disk as the kernel ensure alignments of IO to LBAs. But for > > SMR, it is > > possible to send a write command that is not aligned to a zone > > write pointer > > position, resulting in an unaligned write error. > > Which is why I am pretty sure that the "unaligned write" message is > spurious since I am writing to a plain old SSD. It's going to be hard > for a userspace program to generate a write that is no properly > aligned for the SSD. Unless your SSD is really buggy and throws strange errors, which is always a possibility. Do you have a good reproducer of the problem ? > > Cheers, > Peter