On Thu, Mar 14, 2024 at 02:18:38AM +0000, Kevin Xie wrote: > > Re: [PATCH v15,RESEND 22/23] PCI: starfive: Offload the NVMe timeout > > workaround to host drivers. > > > > On Mon, Mar 04, 2024 at 10:08:06AM -0800, Palmer Dabbelt wrote: > > > On Thu, 29 Feb 2024 07:08:43 PST (-0800), lpieralisi@xxxxxxxxxx wrote: > > > > On Tue, Feb 27, 2024 at 06:35:21PM +0800, Minda Chen wrote: > > > > > From: Kevin Xie <kevin.xie@xxxxxxxxxxxxxxxx> > > > > > > > > > > As the Starfive JH7110 hardware can't keep two inbound post write > > > > > in order all the time, such as MSI messages and NVMe completions. > > > > > If the NVMe completion update later than the MSI, an NVMe IRQ handle > > will miss. > > > > > > > > Please explain what the problem is and what "NVMe completions" means > > > > given that you are talking about posted writes. > > Sorry, we made a casual conclusion here. > Not any two of inbound post requests can`t be kept in order in JH7110 SoC, > the only one case we found is NVMe completions with MSI interrupts. > To be more precise, they are the pending status in nvme_completion struct and > nvme_irq handler in nvme/host/pci.c. > > We have shown the original workaround patch before: > https://lore.kernel.org/lkml/CAJM55Z9HtBSyCq7rDEDFdw644pOWCKJfPqhmi3SD1x6p3g2SLQ@xxxxxxxxxxxxxx/ > We put it in our github branch and works fine for a long time. > Looking forward to better advices from someone familiar with NVMe drivers. So this platform treats strictly ordered writes the same as if relaxed ordering was enabled? I am not sure if we could reasonably work around such behavior. An arbitrary delay is likely too long for most cases, and too short for the worst case. I suppose we could quirk a non-posted transaction in the interrupt handler to force flush pending memory updates, but that will noticeably harm your nvme performance. Maybe if you constrain such behavior to the spurious IRQ_NONE condition, then it might be okay? I don't know.