Hi Kate, Some background info for other people following this thread, as discussed here: https://bugzilla.kernel.org/show_bug.cgi?id=201693 https://bugzilla.kernel.org/show_bug.cgi?id=203475 There are a lot of users who are reporting disk issues (including data corruption with Samsung 860 and 870 SSDs. Coming up with fixes for this has taken longer then it should because I failed to realize for a long time that there really are 2 separate issues here: https://bugzilla.kernel.org/show_bug.cgi?id=203475#c34 """ So after completely re-reading / analyzing both this bug as well as bug 201693 with a fresh pair of eyes (since the last time I did this was a long time ago) I agree. After careful reading / analysis it seems that there really are 2 different bugs here impacting both the 860 EVO and the 870 EVO: 1. Queued Trim commands are causing issues on Intel + ASmedia + Marvell controllers 2. Things are seriously broken on AMD controllers and only completely disabling NCQ altogether helps there. I will submit a kernel patch (with a Fixes tag so that it gets backported to stable series) for 1. right away; and I've asked a colleague to start working on a new ATA horkage flag which disables NCQ on AMD SATA controllers only, so that we can add that flag (together with the ATA_HORKAGE_NO_NCQ_TRIM flag which my patch adds) to the 860 EVO and the 870 EVO to also resolve 2. """ I asked Kate to write this patch to address 2., note this patch is to be applied on top of my " libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs" patch. Kate, thank you for your patch. There are several issues which need to be addressed before this patch can be accepted, starting with the commit message. It seems that you used the commit message as my patch as a template, but you forgot to change the Subject (the first line) for the next version please change the subject to something correctly describing this patch. I also see that you gave this patch a version of 2, but since this patch does not replace my patch, in other words it is a different patch you should have just made it v1. Anyways lets just make the next version v3 to avoid confusion. The rest of the commit message should have 1 paragraph describing the reason why the patch is necessary + a second paragraph describing what the patch is doing to address this. Your cover-letter would be a good candidate for the second paragraph, resulting in for example something like this as body of the commit message: """ Many users are reporting that the Samsung 860 and 870 SSDs are having various issues when combined with AMD SATA controllers and only completely disabling NCQ helps to avoid these issues. Entirely disabling NCQ for Samsung 860/870 SSD will cause I/O performance drop. In this case, a flag ATA_HORKAGE_NONCQ_ON_AMD is introduced to used to perform additional check for these SSDs. If it finds it's parent ATA controller is AMD then NCQ will be disabled. Otherwise the NCQ is kept to enable. """ On 8/27/21 7:33 AM, Kate Hsuan wrote: > A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ > on AMD/MAREL/ASMEDIA chipsets. > > Samsung 860/870 SSD are disabled to use NCQ trim functions but it will > lead to performace drop. From the bugzilla, we could realize the issues > only appears on those chipset mentioned above. So this flag could be > used to only disable NCQ on specific chipsets. > > Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860") > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475 > Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx> > Reviewed-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx> > Signed-off-by: Kate Hsuan <hpa@xxxxxxxxxx> > --- > drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++----- > include/linux/libata.h | 3 +++ > 2 files changed, 35 insertions(+), 5 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index cc459ce90018..50f635669dd4 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev) > static void ata_dev_config_ncq_send_recv(struct ata_device *dev) > { > struct ata_port *ap = dev->link->ap; > + struct device *parent = NULL; > + struct pci_dev *pcidev = NULL; > unsigned int err_mask; > > if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) { > @@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev) > memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE); > > if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) { > - ata_dev_dbg(dev, "disabling queued TRIM support\n"); > - cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &= > - ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM; > + if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL) > + { > + // get parent device for the controller vendor ID > + for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent) > + { > + if(dev_is_pci(parent)) > + { > + pcidev = to_pci_dev(parent); > + if (pcidev->vendor == PCI_VENDOR_ID_MARVELL || > + pcidev->vendor == PCI_VENDOR_ID_AMD || > + pcidev->vendor == PCI_VENDOR_ID_ASMEDIA ) > + { > + ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n", > + pcidev->vendor, pcidev->device); > + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &= > + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM; > + } > + break; > + } > + } > + }else > + { > + ata_dev_dbg(dev, "disabling queued TRIM support\n"); > + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &= > + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM; > + } Please don't nest the handling of the new ATA_HORKAGE_NONCQ_ON_AMD flag with the handling of other flags. Also you are just disabling queued-trims now, which my patch already does, instead the new check should completely disable NCQ, this means moving the check to ata_dev_config_ncq() adding the new check after this check: if (dev->horkage & ATA_HORKAGE_NONCQ) { snprintf(desc, desc_sz, "NCQ (not used)"); return 0; } And then do the same, but only if pcidev->vendor == PCI_VENDOR_ID_AMD. > } > } > } > @@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { > { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | > ATA_HORKAGE_ZERO_AFTER_TRIM, }, > { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | > - ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + ATA_HORKAGE_ZERO_AFTER_TRIM | > + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, }, > { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | > - ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + ATA_HORKAGE_ZERO_AFTER_TRIM | > + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, }, > { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | > ATA_HORKAGE_ZERO_AFTER_TRIM, }, > > diff --git a/include/linux/libata.h b/include/linux/libata.h > index 3fcd24236793..ec17f1f3fbf6 100644 > --- a/include/linux/libata.h > +++ b/include/linux/libata.h > @@ -422,6 +422,9 @@ enum { > ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */ > ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */ > ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */ > + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on > + ASMeida, AMD, and Marvell > + Chipset*/ When we initially discussed this I know I said that we would need to disable NCQ on AMD + ASMEDIA + Marvell hosts, but after carefully reading both bugs again I've come to the conclusion that for Asmedia and Marvell SATA hosts just disabling trimmed queue as my patch does is enough. So please rename this to: ATA_HORKAGE_NONCQ_ON_AMD (and only check for an AMD vendor-id when doing the check). > > /* DMA mask for user DMA control: User visible values; DO NOT > renumber */ > Regards, Hans