Re: [PATCH v2 1/1] libata: libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kate,

Some background info for other people following this thread, as discussed here:
https://bugzilla.kernel.org/show_bug.cgi?id=201693
https://bugzilla.kernel.org/show_bug.cgi?id=203475

There are a lot of users who are reporting disk issues (including data
corruption with Samsung 860 and 870 SSDs. Coming up with fixes for this 
has taken longer then it should because I failed to realize for a long
time that there really are 2 separate issues here:

https://bugzilla.kernel.org/show_bug.cgi?id=203475#c34

"""
So after completely re-reading / analyzing both this bug as well as bug 201693 with a fresh pair of eyes (since the last time I did this was a long time ago) I agree. After careful reading / analysis it seems that there really are 2 different bugs here impacting both the 860 EVO and the 870 EVO:

1. Queued Trim commands are causing issues on Intel + ASmedia + Marvell controllers

2. Things are seriously broken on AMD controllers and only completely disabling NCQ altogether helps there.


I will submit a kernel patch (with a Fixes tag so that it gets backported to stable series) for 1. right away; and I've asked a colleague to start working on a new ATA horkage flag which disables NCQ on AMD SATA controllers only, so that we can add that flag (together with the ATA_HORKAGE_NO_NCQ_TRIM flag which my patch adds) to the 860 EVO and the 870 EVO to also resolve 2.
"""

I asked Kate to write this patch to address 2., note this patch is to be applied
on top of my " libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs"
patch.

Kate, thank you for your patch. There are several issues which need
to be addressed before this patch can be accepted, starting with the
commit message.

It seems that you used the commit message as my patch as a template, but
you forgot to change the Subject (the first line) for the next version please
change the subject to something correctly describing this patch.

I also see that you gave this patch a version of 2, but since this patch
does not replace my patch, in other words it is a different patch you
should have just made it v1. Anyways lets just make the next version v3
to avoid confusion.

The rest of the commit message should have 1 paragraph describing the reason
why the patch is necessary + a second paragraph describing what the patch
is doing to address this. Your cover-letter would be a good candidate for
the second paragraph, resulting in for example something like this as
body of the commit message:

"""
Many users are reporting that the Samsung 860 and 870 SSDs are having
various issues when combined with AMD SATA controllers and only completely
disabling NCQ helps to avoid these issues.

Entirely disabling NCQ for Samsung 860/870 SSD will cause I/O performance
drop. In this case, a flag ATA_HORKAGE_NONCQ_ON_AMD is introduced to used
to perform additional check for these SSDs. If it finds it's parent ATA
controller is AMD then NCQ will be disabled. Otherwise the NCQ is kept to
enable.
"""

On 8/27/21 7:33 AM, Kate Hsuan wrote:
> A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ
> on AMD/MAREL/ASMEDIA chipsets.
> 
> Samsung 860/870 SSD are disabled to use NCQ trim functions but it will
> lead to performace drop. From the bugzilla, we could realize the issues
> only appears on those chipset mentioned above. So this flag could be
> used to only disable NCQ on specific chipsets.
> 
> Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860")
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475
> Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>
> Reviewed-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
> Signed-off-by: Kate Hsuan <hpa@xxxxxxxxxx>
> ---
>  drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++-----
>  include/linux/libata.h    |  3 +++
>  2 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index cc459ce90018..50f635669dd4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev)
>  static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
>  {
>  	struct ata_port *ap = dev->link->ap;
> +	struct device *parent = NULL;
> +	struct pci_dev *pcidev = NULL;
>  	unsigned int err_mask;
>  
>  	if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) {
> @@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
>  		memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE);
>  
>  		if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) {
> -			ata_dev_dbg(dev, "disabling queued TRIM support\n");
> -			cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> -				~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> +			if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL)
> +			{
> +				// get parent device for the controller vendor ID
> +				for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent)
> +				{
> +					if(dev_is_pci(parent))
> +					{
> +						pcidev = to_pci_dev(parent);
> +						if (pcidev->vendor == PCI_VENDOR_ID_MARVELL ||
> +							pcidev->vendor == PCI_VENDOR_ID_AMD 	||
> +							pcidev->vendor == PCI_VENDOR_ID_ASMEDIA )
> +						{
> +							ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n", 
> +												pcidev->vendor, pcidev->device);
> +							cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> +								~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> +						}
> +						break;
> +					}
> +				}
> +			}else
> +			{
> +				ata_dev_dbg(dev, "disabling queued TRIM support\n");
> +				cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> +					~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> +			}

Please don't nest the handling of the new ATA_HORKAGE_NONCQ_ON_AMD flag with the handling of other flags.

Also you are just disabling queued-trims now, which my patch already does, instead the new check should
completely disable NCQ, this means moving the check to ata_dev_config_ncq() adding the new check
after this check:

        if (dev->horkage & ATA_HORKAGE_NONCQ) {
                snprintf(desc, desc_sz, "NCQ (not used)");
                return 0;
        }

And then do the same, but only if pcidev->vendor == PCI_VENDOR_ID_AMD.



>  		}
>  	}
>  }
> @@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
>  	{ "Samsung SSD 850*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
>  						ATA_HORKAGE_ZERO_AFTER_TRIM, },
>  	{ "Samsung SSD 860*",           NULL,   ATA_HORKAGE_NO_NCQ_TRIM |
> -						ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +						ATA_HORKAGE_ZERO_AFTER_TRIM |
> +						ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
>  	{ "Samsung SSD 870*",           NULL,   ATA_HORKAGE_NO_NCQ_TRIM |
> -						ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +						ATA_HORKAGE_ZERO_AFTER_TRIM |
> +						ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
>  	{ "FCCT*M500*",			NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
>  						ATA_HORKAGE_ZERO_AFTER_TRIM, },
>  
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 3fcd24236793..ec17f1f3fbf6 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -422,6 +422,9 @@ enum {
>  	ATA_HORKAGE_NOTRIM	= (1 << 24),	/* don't use TRIM */
>  	ATA_HORKAGE_MAX_SEC_1024 = (1 << 25),	/* Limit max sects to 1024 */
>  	ATA_HORKAGE_MAX_TRIM_128M = (1 << 26),	/* Limit max trim size to 128M */
> +	ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on 
> +							ASMeida, AMD, and Marvell 
> +							Chipset*/

When we initially discussed this I know I said that we would need to disable
NCQ on AMD + ASMEDIA + Marvell hosts, but after carefully reading both bugs again
I've come to the conclusion that for Asmedia and Marvell SATA hosts just disabling
trimmed queue as my patch does is enough. So please rename this to:

ATA_HORKAGE_NONCQ_ON_AMD

(and only check for an AMD vendor-id when doing the check).

>  
>  	 /* DMA mask for user DMA control: User visible values; DO NOT
>  	    renumber */
> 

Regards,

Hans




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux