RE: [EXTERNAL] Re: [PATCH v2] libata-sata: Check SDB_FIS for completion of DMA transfer before completing the commands.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Damien,
Thanks for the feedback. It a hardware issue. I acknowledge your concerns and agrees for not adding this change to upstream.

Thanks,
~Saurav

> -----Original Message-----
> From: Damien Le Moal <dlemoal@xxxxxxxxxx>
> Sent: Friday, March 15, 2024 1:39 PM
> To: Saurav Kashyap <skashyap@xxxxxxxxxxx>; cassel@xxxxxxxxxx
> Cc: linux-ide@xxxxxxxxxxxxxxx; soochon@xxxxxxxxxx; Manoj Phadtare
> <mphadtare@xxxxxxxxxxx>
> Subject: [EXTERNAL] Re: [PATCH v2] libata-sata: Check SDB_FIS for completion
> of DMA transfer before completing the commands.
> 
> Prioritize security for external emails: Confirm sender and content safety before
> clicking links or opening attachments
> 
> ----------------------------------------------------------------------
> On 3/15/24 14:44, Saurav Kashyap wrote:
> > This issue is seen on Marvell Controller with device ids 0x9215 and 0x9235.
> 
> I do not see 0x9215 listed. Is this one supposed to work OK with generic AHCI ?
> 
> > Its reproduced within a minute with block size of 64K, 100 threads,
> > 100 jobs and  512 iodepth on AMD platform. With decreased work load it
> > takes 8-9 hrs.
> >
> > Sequence leading to an issue as per PCIe trace
> > - PxSact is read with slots 7 and 24 being cleared.
> > - Host starts processing these commands while data is not in system
> >   memory yet.
> 
> This is a serious hardware bug, but is this issue tied to the fact that the host
> is AMD ? Does the same issue happen with different hosts (e.g. Intel, ARM, etc)
> ? And what about devices ? Do you see this error if you change devices too ? Or
> does this happen only with one particular device model/vendor ? (in which case,
> the issue could be with the device and not the adapter).
> 
> > - Last pkt of 512B was sent to host.
> > - SDB.FIS is copied, telling host command slot 24 is done.
> >
> > Reading SDB.FIS confirms the transfer is complete.
> >
> > Cc: Soochon Radee <soochon@xxxxxxxxxx>
> > Tested-by: Manoj Phadtare <mphadtare@xxxxxxxxxxx>
> > Signed-off-by: Saurav Kashyap <skashyap@xxxxxxxxxxx>
> > ---
> > v1->v2:
> > Added workload and platform related details in the description.
> >
> >  drivers/ata/libata-sata.c | 28 +++++++++++++++++++++++++++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> > index 0fb1934875f2..7cdeb0a38c5b 100644
> > --- a/drivers/ata/libata-sata.c
> > +++ b/drivers/ata/libata-sata.c
> > @@ -14,9 +14,11 @@
> >  #include <scsi/scsi_eh.h>
> >  #include <linux/libata.h>
> >  #include <asm/unaligned.h>
> > +#include <linux/pci.h>
> >
> >  #include "libata.h"
> >  #include "libata-transport.h"
> > +#include "ahci.h"
> >
> >  /* debounce timing parameters in msecs { interval, duration, timeout } */
> >  const unsigned int sata_deb_timing_normal[]		= {   5,  100, 2000 };
> > @@ -649,6 +651,7 @@ EXPORT_SYMBOL_GPL(sata_link_hardreset);
> >  int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active)
> >  {
> >  	u64 done_mask, ap_qc_active = ap->qc_active;
> > +	struct pci_dev *pdev = to_pci_dev(ap->host->dev);
> >  	int nr_done = 0;
> >
> >  	/*
> > @@ -677,7 +680,30 @@ int ata_qc_complete_multiple(struct ata_port *ap,
> u64 qc_active)
> >  		unsigned int tag = __ffs64(done_mask);
> >
> >  		qc = ata_qc_from_tag(ap, tag);
> > -		if (qc) {
> > +		if (pdev->vendor == PCI_VENDOR_ID_MARVELL_EXT &&
> > +		    (pdev->device == 0x9215 || pdev->device == 0x9235)) {
> > +			struct ahci_port_priv *pp = ap->private_data;
> > +			u8 *rx_fis = pp->rx_fis;
> > +
> > +			if (pp->fbs_enabled)
> > +				rx_fis += ap->link.pmp * AHCI_RX_FIS_SZ;
> > +
> > +			if (!qc)
> > +				continue;
> > +
> > +			if (ata_is_ncq(qc->tf.protocol)) {
> > +				u32 *fis = (u32 *)(rx_fis + RX_FIS_SDB);
> > +				u32 fis_active = fis[1];
> 
> It really looks like this should be done in ahci_qc_complete() instead of here
> per qc. And the fact that you need to do this also tend to indicate that the
> *device* is sending incorrect SDB FIS... Are you really sure it is an adapter
> issue ?
> 
> > +
> > +				if ((fis_active & (1 << tag))) {
> > +					ata_qc_complete(qc);
> > +					nr_done++;
> > +				}
> > +			} else {
> > +				ata_qc_complete(qc);
> > +				nr_done++;
> > +			}
> > +		} else if (qc) {
> 
> This is not acceptable as-is because this adds overhead for all well-behave
> AHCI/SATA adapters that do not have this bug. Given the problem at hand, I am
> tempted to suggest that any device attached to these adapters should simply be
> marked with ATA_HORKAGE_NONCQ to disable NCQ. But even then, if the
> adapter
> raises an interrupt before all data is transferred, things will break.
> 
> I am very reluctant to even try to add a workaround such broken adapter, if this
> really turns out to be an adapter issue (as opposed to a device issue).
> 
> Have you looked at sata_mv.c ? Anything relevant to these adapters in there ?
> 
> >  			ata_qc_complete(qc);
> >  			nr_done++;
> >  		}
> 
> --
> Damien Le Moal
> Western Digital Research





[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux