Re: [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs

Evan Green <evgreen@xxxxxxxxxxxx> · Fri, 24 Jan 2020 13:53:15 -0800

On Fri, Jan 24, 2020 at 6:34 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Evan,
>
> Evan Green <evgreen@xxxxxxxxxxxx> writes:
> > I did another experiment that I think lends credibility to my torn MSI
> > hypothesis. I have the following change:
> >
> > And indeed, I get a machine check, despite the fact that MSI_DATA is
> > overwritten just after address is updated.
>
> I don't have to understand why a SoC released in 2019 still has
> unmaskable MSI especially as Inhell's own XHCI spec clearly documents
> and recommends MSI-X.
>
> While your workaround (disabling MSI) works in this particular case it's
> not really a good option:
>
>  1) Quite some devices have a bug where the legacy INTX disable does not
>     work reliably or is outright broken. That means MSI disable will
>     reroute to INTX.
>
>  2) I digged out old debug data which confirms that some silly devices
>     lose interrupts accross MSI disable/reenable if the INTX fallback is
>     disabled.
>
>     And no, it's not a random weird device, it's part of a chipset which
>     was pretty popular a few years ago. I leave it as an excercise for
>     the reader to guess the vendor.
>
> Can you please apply the patch below? It enforces an IPI to the new
> vector/target CPU when the interrupt is MSI w/o masking. It should
> cure the issue. It goes without saying that I'm not proud of it.

I'll feel just as dirty putting a tested-by on it :)

I don't think this patch is complete. As written, it creates "recovery
interrupts" for MSIs that are not maskable, however through the
pci_msi_domain_write_msg() path, which is the one I seem to use, we
make no effort to mask the MSI while changing affinity. So at the very
least it would need a follow-on patch that attempts to mask the MSI,
for MSIs that are maskable. __pci_restore_msi_state(), called in the
resume path, does have this masking, but for some reason not
pci_msi_domain_write_msg().

I'm also a bit concerned about all the spurious interrupts we'll be
introducing. Not just the retriggering introduced here, but the fact
that we never dealt with the torn interrupt. So in my case, XHCI will
be sending an interrupt on the old vector to the new CPU, which could
be registered to anything. I'm worried that not every driver in the
system is hardened to receiving interrupts it's not prepared for.
Perhaps the driver misbehaves, or perhaps it's a "bad" interrupt like
the MCE interrupt that takes the system down. (I realize the MCE
interrupt itself is not in the device vector region, but some other
bad interrupt then).

Now that you're on board with the torn write theory, what do you think
about my "transit vector" proposal? The idea is this:
 - Reserve a single vector number on all CPUs for interrupts in
transit between CPUs.
 - Interrupts in transit between CPUs are added to some sort of list,
or maybe the transit vector itself.
 - __pci_msi_write_msg() would, after proper abstractions, essentially
be doing this:
    pci_write(MSI_DATA, TRANSIT_VECTOR);
    pci_write(MSI_ADDRESS, new_affinity);
    pci_write(MSI_DATA, new_vector);
 - In the rare torn case I've found here, the interrupt will come in
on <new CPU, transit_vector>, or <old CPU, transit_vector>.
 - The ISR for TRANSIT_VECTOR would go through and call the ISR for
every IRQ in transit across CPUs. This does still result in a couple
extra ISR calls, since multiple interrupts might be in transit across
CPUs, but at least it's very rare.
 - CPU hotplug would keep the same logic it already has, retriggering
TRANSIT_VECTOR if it happened to land on <old CPU, old vector>.
 - When the interrupt is confirmed on <new CPU, new vector>, remove
the ISR from the TRANSIT_VECTOR list.

If you think it's a worthwhile idea I can try to code it up.

I've been running your patch for about 30 minutes, with no repro case.
-Evan