Re: [PATCH] PCI: Avoid FLR for AMD Matisse HD Audio and USB Controllers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 18 May 2020 at 20:26, Marcos Scriven <marcos@xxxxxxxxxxx> wrote:
>
> On Mon, 18 May 2020 at 17:17, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > [+cc Alex]
> >
> > On Sat, May 16, 2020 at 02:37:23PM +0100, Marcos Scriven wrote:
> > > This patch fixes an FLR bug on the following two devices:
> > >
> > > AMD Matisse HD Audio Controller 0x1487
> > > AMD Matisse USB 3.0 Host Controller 0x149c
> > >
> > > As there was already such a quirk for an Intel network device, I have
> > > renamed that method and updated the comments, trying to make it
> > > clearer what the specific original devices that were affected are
> > > (based on the commit message this was original done:
> > > https://git.thm.de/mhnn55/eco32-linux-ba/commit/f65fd1aa4f9881d5540192d11f7b8ed2fec936db).
> > >
> > > I have ordered them by hex product ID.
> > >
> > > I have verified this works on a X570 I AORUS PRO WIFI (rev. 1.0) motherboard.
> >
> > If we avoid FLR, is there another method used to reset these devices
> > between attachments to different VMs?  Does the lack of FLR mean we
> > can leak information between VMs?
> >
> > Would additional delay after the FLR work around this, e.g., something
> > like 51ba09452d11 ("PCI: Delay after FLR of Intel DC P3700 NVMe")?
> >
>
> Thanks for looking at this patch Bjorn.
>
> To take your three points:
>
> 1. Certainly I can see those devices able to be passed back and forth
> between host and guest multiple times, once this patch is applied.
>
> 2. I don't know the answer to that question; would appreciate guidance
> on how to determine this. Do you mean perhaps some buffered data in
> the USB controller, for instance?
>
> 3. I have not tried an additional delay. This is the logs I see when
> the error is occurring:
>
> [ 2423.556570] vfio-pci 0000:0c:00.3: not ready 1023ms after FLR; waiting
> [ 2425.604526] vfio-pci 0000:0c:00.3: not ready 2047ms after FLR; waiting
> [ 2428.804509] vfio-pci 0000:0c:00.3: not ready 4095ms after FLR; waiting
> [ 2433.924409] vfio-pci 0000:0c:00.3: not ready 8191ms after FLR; waiting
> [ 2443.140721] vfio-pci 0000:0c:00.3: not ready 16383ms after FLR; waiting
> [ 2461.571944] vfio-pci 0000:0c:00.3: not ready 32767ms after FLR; waiting
> [ 2496.387544] vfio-pci 0000:0c:00.3: not ready 65535ms after FLR; giving up
>
> What makes this bug especially bad is the host never recovers, and
> eventually hangs or crashes.
>
> For reference, the delay example you're talking about is:
>
> static int delay_250ms_after_flr(struct pci_dev *dev, int probe)
> {
> if (!pcie_has_flr(dev))
> return -ENOTTY;
>
> if (probe)
> return 0;
>
> pcie_flr(dev);
>
> msleep(250);
>
> return 0;
> }
>
> I don't know if it would work, but I will try it out and report back.
>
> Marcos
>
>

Bjorn/Alex

I have just tried the alternate approach of adding a 250ms delay to
the function level reset - this unfortunately results in the same
broken behaviour, with the host itself never recovering.

[   76.905410] vfio-pci 0000:0d:00.3: not ready 1023ms after FLR; waiting
[   79.018014] vfio-pci 0000:0d:00.3: not ready 2047ms after FLR; waiting
[   82.089390] vfio-pci 0000:0d:00.3: not ready 4095ms after FLR; waiting
[   87.209416] vfio-pci 0000:0d:00.3: not ready 8191ms after FLR; waiting
[   96.425440] vfio-pci 0000:0d:00.3: not ready 16383ms after FLR; waiting
[  114.615491] vfio-pci 0000:0d:00.3: not ready 32767ms after FLR; waiting
[  149.417712] vfio-pci 0000:0d:00.3: not ready 65535ms after FLR; giving up

I also tried a full second, to no avail.

What would be the next step in proceeding with the original patch please?

How can I allay your concerns on data leaking between VMs?

Thanks for your help with the patch.

Marcos

> > > From 651176ab164ae51e37d5bb86f5948da558744930 Mon Sep 17 00:00:00 2001
> > > From: Marcos Scriven <marcos@xxxxxxxxxxx>
> > > Date: Sat, 16 May 2020 14:23:26 +0100
> > > Subject: [PATCH] PCI: Avoid FLR for:
> > >
> > >     AMD Matisse HD Audio Controller 0x1487
> > >     AMD Matisse USB 3.0 Host Controller 0x149c
> > >
> > > These devices advertise a Function Level Reset (FLR) capability, but hang
> > > when an FLR is triggered.
> > >
> > > To reproduce the problem, attach the device to a VM, then detach and try to
> > > attach again.
> > >
> > > Add a quirk to prevent the use of FLR on these devices.
> > >
> > > Signed-off-by: Marcos Scriven <marcos@xxxxxxxxxxx>
> > > ---
> > >  drivers/pci/quirks.c | 18 ++++++++++++++----
> > >  1 file changed, 14 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 28c9a2409c50..ff310f0cac22 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -5129,13 +5129,23 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev)
> > >  }
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> > >
> > > -/* FLR may cause some 82579 devices to hang */
> > > -static void quirk_intel_no_flr(struct pci_dev *dev)
> > > +/*
> > > + * FLR may cause the following to devices to hang:
> > > + *
> > > + * AMD Starship/Matisse HD Audio Controller 0x1487
> > > + * AMD Matisse USB 3.0 Host Controller 0x149c
> > > + * Intel 82579LM Gigabit Ethernet Controller 0x1502
> > > + * Intel 82579V Gigabit Ethernet Controller 0x1503
> > > + *
> > > + */
> > > +static void quirk_no_flr(struct pci_dev *dev)
> > >  {
> > >      dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET;
> > >  }
> > > -DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_no_flr);
> > > -DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_no_flr);
> > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr);
> > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, quirk_no_flr);
> > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr);
> > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr);
> > >
> > >  static void quirk_no_ext_tags(struct pci_dev *pdev)
> > >  {
> > > --
> > > 2.25.1



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux