On 2/3/22 17:32, Anton Lundin wrote: > On 03 February, 2022 - Damien Le Moal wrote: > >> On 2022/02/02 22:54, Anton Lundin wrote: >>> On 02 February, 2022 - Anton Lundin wrote: >>> >>>> On 02 February, 2022 - Damien Le Moal wrote: >>>> >>>>> On 2022/02/02 21:25, Anton Lundin wrote: >>>>>> On 02 February, 2022 - Damien Le Moal wrote: >>>>>> >>>>>>> On 2/2/22 19:05, Anton Lundin wrote: >>>>>>>> Back in 06f6c4c6c3e8 ("ata: libata: add missing ata_identify_page_supported() calls") >>>>>>>> a read of ATA_LOG_DIRECTORY page was added. This caused the >>>>>>>> SATADOM-ML 3ME to lock up. >>>>>>>> >>>>>>>> In 636f6e2af4fb ("libata: add horkage for missing Identify Device log") >>>>>>>> a flag was added to cache if a device supports this or not. >>>>>>>> >>>>>>>> This adds a blacklist entry which flags that these devices doesn't >>>>>>>> support that call and shouldn't be issued that call. >>>>>>>> >>>>>>>> Cc: stable@xxxxxxxxxxxxxxx # v5.10+ >>>>>>>> Signed-off-by: Anton Lundin <glance@xxxxxxxxxx> >>>>>>>> Depends-on: 636f6e2af4fb ("libata: add horkage for missing Identify Device log") >>>>>>> >>>>>>> I do not think so. See below. >>>>>>> >>>>>>>> --- >>>>>>>> drivers/ata/libata-core.c | 7 +++++++ >>>>>>>> 1 file changed, 7 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c >>>>>>>> index 87d36b29ca5f..e024af9f33d0 100644 >>>>>>>> --- a/drivers/ata/libata-core.c >>>>>>>> +++ b/drivers/ata/libata-core.c >>>>>>>> @@ -4070,6 +4070,13 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { >>>>>>>> { "WDC WD3000JD-*", NULL, ATA_HORKAGE_WD_BROKEN_LPM }, >>>>>>>> { "WDC WD3200JD-*", NULL, ATA_HORKAGE_WD_BROKEN_LPM }, >>>>>>>> >>>>>>>> + /* >>>>>>>> + * This sata dom goes on a walkabout when it sees the >>>>>>>> + * ATA_LOG_DIRECTORY read request so ensure we don't issue such a >>>>>>>> + * request to these devices. >>>>>>>> + */ >>>>>>>> + { "SATADOM-ML 3ME", NULL, ATA_HORKAGE_NO_ID_DEV_LOG }, >>>>>>> >>>>>>> This flag only disables trying to access the identify device log page, >>>>>>> it does *not* avoid access to the log directory log page in general. The >>>>>>> log directory will still be consulted for other log pages beside the >>>>>>> identify device log page, from any function that calls >>>>>>> ata_log_supported() (e.g. ata_dev_config_ncq_send_recv() and >>>>>>> ata_dev_config_ncq_non_data()) >>>>>> >>>>>> Non of those code paths are called for this device, probably due to some >>>>>> other flag disqualifying them. >>>>>> >>>>>>> So it will be a lot more solid to define a ATA_HORKAGE_NO_LOG_DIR flag >>>>>>> and test for it in ata_log_supported(), completely preventing any access >>>>>>> to the log directory page for this drive type. >>>>>> >>>>>> That was my first thought but then I found ATA_HORKAGE_NO_ID_DEV_LOG >>>>>> which was in the calling path that actually triggered this issue. >>>>>> >>>>>> But, yes, I totally agree that's a more solid solution preventing this >>>>>> kind of issue to crop up again. >>>>>> >>>>>>>> + >>>>>>>> /* End Marker */ >>>>>>>> { } >>>>>>>> }; >>>>>>> >>>>>>> Note: if you need this fix sent to linux-stable, add "Cc: stable@..." >>>>>>> and a Fixes tag. >>>>>> >>>>>> I'd think it's fitting to send it to linux-stable, because it prevents >>>>>> those DOM's from working in v5.15.5+. >>>>>> >>>>>> Ok. I must have missed that part when I read submitting-patches doc. >>>>>> >>>>>> I'll rework and re-submit the patch. >>>>> >>>>> I sent you a draft patch. Please try it. >>>> >>>> Works like a charm. >>>> >>>>> Also, to confirm if the log directory log page is indeed the page that locks up >>>>> the drive, can you try this command: >>>>> >>>>> sg_sat_read_gplog --dma --log=0 --page=0 --readonly >>>> >>>> # sg_sat_read_gplog --dma --log=0 --page=0 --readonly /dev/sda >>>> ATA PASS-THROUGH (16), bad field in cdb >>>> sg_sat_read_gplog failed: Illegal request >>>> >>>>> and >>>>> >>>>> sg_sat_read_gplog --log=0 --page=0 --readonly >>>> >>>> # sg_sat_read_gplog --log=0 --page=0 --readonly /dev/sda >>>> 00 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 08 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 10 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 18 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 20 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 28 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 30 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 38 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 40 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 48 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 50 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 58 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 60 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 68 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 70 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 78 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 80 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 88 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 90 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> 98 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> a0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> a8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> b0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> b8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> c0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> c8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> d0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> d8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> e0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> e8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> f0 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> f8 0000 0000 0000 0000 0000 0000 0000 0000 .. .. .. .. .. .. .. .. >>>> >>>> Mind you, this with a patched kernel, if that affects anything. >>> >>> Without a patched kernel, (grabbed os with a 4.19.94 kernel) the command >>> hangs for a while: >>> >>> # time sg_sat_read_gplog --log=0 --page=0 --readonly /dev/sda >>> ATA PASS-THROUGH (16), bad field in cdb >>> sg_sat_read_gplog failed: Illegal request >>> >>> real 0m28.337s >>> user 0m0.000s >>> sys 0m0.001s >>> >>> and logs the following in the kernel log: >>> >>> ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >>> ata1.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) >>> ata1.00: revalidation failed (errno=-5) >>> ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >>> ata1.00: configured for UDMA/133 >>> >>> >>> But after that 30s walkabout and whatever the kernel does the device >>> starts functioning again. >> >> The 30s "hang" is the default command timeout: your drive is not responding to >> the DMA version of READ LOG EXT command. There are some drives out there like >> that. So instead of completely disabling access to the log directory, we should >> simply force the use of READ LOG EXT. And for that, there is the horkage flag >> ATA_HORKAGE_NO_DMA_LOG. >> >> Can you try using that one without the patch I sent ? > > I think I tested that before submitting the patch, but re-tested it now > and that doesn't work: Your original patch used the ATA_HORKAGE_NO_ID_DEV_LOG flag. What ATA_HORKAGE_NO_DMA_LOG does is prevent the use of the READ LOG DMA EXT command. READ LOG EXT command (PIO) will be used. And the sg commands I sent you and that you tried show that it works: sg_sat_read_gplog --log=0 --page=0 --readonly worked perfectly, while: sg_sat_read_gplog --dma --log=0 --page=0 --readonly failed. So by simply preventing the use of READ LOG DMA EXT for your drive, the kernel will be able to safely get the log directory and test for other page support. In your original patch, try replacing ATA_HORKAGE_NO_ID_DEV_LOG with ATA_HORKAGE_NO_DMA_LOG. > > This is on a 5.15 kernel: > > ata1.00: qc timeout (cmd 0x2f) > ata1.00: Read log 0x00 page 0x00 failed, Emask 0x4 > ata1.00: ATA Identify Device Log not supported > ata1.00: failed to set xfermode (err_mask=0x40) > ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata1.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) > ata1.00: revalidation failed (errno=-5) > ata1: limiting SATA link speed to 3.0 Gbps > ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) > ata1.00: qc timeout (cmd 0x2f) > ata1.00: Read log 0x00 page 0x00 failed, Emask 0x4 > ata1.00: ATA Identify Device Log not supported > ata1.00: failed to set xfermode (err_mask=0x40) > ata1.00: disabled > > On a 5.16+ with the "libata: add horkage for missing Identify Device > log" patch, the machine will boot because it won't re-try to access the > ata log directory after the reset to 3.0 Gbps sata. > > > //Anton -- Damien Le Moal Western Digital Research