Re: [PATCH v15,RESEND 22/23] PCI: starfive: Offload the NVMe timeout workaround to host drivers.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 14, 2024 at 02:18:38AM +0000, Kevin Xie wrote:
> > Re: [PATCH v15,RESEND 22/23] PCI: starfive: Offload the NVMe timeout
> > workaround to host drivers.
> > 
> > On Mon, Mar 04, 2024 at 10:08:06AM -0800, Palmer Dabbelt wrote:
> > > On Thu, 29 Feb 2024 07:08:43 PST (-0800), lpieralisi@xxxxxxxxxx wrote:
> > > > On Tue, Feb 27, 2024 at 06:35:21PM +0800, Minda Chen wrote:
> > > > > From: Kevin Xie <kevin.xie@xxxxxxxxxxxxxxxx>
> > > > >
> > > > > As the Starfive JH7110 hardware can't keep two inbound post write
> > > > > in order all the time, such as MSI messages and NVMe completions.
> > > > > If the NVMe completion update later than the MSI, an NVMe IRQ handle
> > will miss.
> > > >
> > > > Please explain what the problem is and what "NVMe completions" means
> > > > given that you are talking about posted writes.
> 
> Sorry, we made a casual conclusion here.
> Not any two of inbound post requests can`t be kept in order in JH7110 SoC, 
> the only one case we found is NVMe completions with MSI interrupts.
> To be more precise, they are the pending status in nvme_completion struct and
> nvme_irq handler in nvme/host/pci.c.
> 
> We have shown the original workaround patch before:
> https://lore.kernel.org/lkml/CAJM55Z9HtBSyCq7rDEDFdw644pOWCKJfPqhmi3SD1x6p3g2SLQ@xxxxxxxxxxxxxx/
> We put it in our github branch and works fine for a long time.
> Looking forward to better advices from someone familiar with NVMe drivers.

So this platform treats strictly ordered writes the same as if relaxed
ordering was enabled? I am not sure if we could reasonably work around
such behavior. An arbitrary delay is likely too long for most cases, and
too short for the worst case.

I suppose we could quirk a non-posted transaction in the interrupt
handler to force flush pending memory updates, but that will noticeably
harm your nvme performance. Maybe if you constrain such behavior to the
spurious IRQ_NONE condition, then it might be okay? I don't know.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux