Re: [PATCH] PCI: Avoid FLR for AMD Matisse HD Audio and USB Controllers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 21 May 2020 at 00:29, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Wed, May 20, 2020 at 10:41:03AM +0100, Marcos Scriven wrote:
> > On Mon, 18 May 2020 at 20:26, Marcos Scriven <marcos@xxxxxxxxxxx> wrote:
> > >
> > > On Mon, 18 May 2020 at 17:17, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > >
> > > > [+cc Alex]
> > > >
> > > > On Sat, May 16, 2020 at 02:37:23PM +0100, Marcos Scriven wrote:
> > > > > This patch fixes an FLR bug on the following two devices:
> > > > >
> > > > > AMD Matisse HD Audio Controller 0x1487
> > > > > AMD Matisse USB 3.0 Host Controller 0x149c
> > > > >
> > > > > As there was already such a quirk for an Intel network device, I have
> > > > > renamed that method and updated the comments, trying to make it
> > > > > clearer what the specific original devices that were affected are
> > > > > (based on the commit message this was original done:
> > > > > https://git.thm.de/mhnn55/eco32-linux-ba/commit/f65fd1aa4f9881d5540192d11f7b8ed2fec936db).
> > > > >
> > > > > I have ordered them by hex product ID.
> > > > >
> > > > > I have verified this works on a X570 I AORUS PRO WIFI (rev. 1.0) motherboard.
> > > >
> > > > If we avoid FLR, is there another method used to reset these devices
> > > > between attachments to different VMs?  Does the lack of FLR mean we
> > > > can leak information between VMs?
> > > >
> > > > Would additional delay after the FLR work around this, e.g., something
> > > > like 51ba09452d11 ("PCI: Delay after FLR of Intel DC P3700 NVMe")?
> > > >
> > >
> > > Thanks for looking at this patch Bjorn.
> > >
> > > To take your three points:
> > >
> > > 1. Certainly I can see those devices able to be passed back and forth
> > > between host and guest multiple times, once this patch is applied.
> > >
> > > 2. I don't know the answer to that question; would appreciate guidance
> > > on how to determine this. Do you mean perhaps some buffered data in
> > > the USB controller, for instance?
> > >
> > > 3. I have not tried an additional delay. This is the logs I see when
> > > the error is occurring:
> > >
> > > [ 2423.556570] vfio-pci 0000:0c:00.3: not ready 1023ms after FLR; waiting
> > > [ 2425.604526] vfio-pci 0000:0c:00.3: not ready 2047ms after FLR; waiting
> > > [ 2428.804509] vfio-pci 0000:0c:00.3: not ready 4095ms after FLR; waiting
> > > [ 2433.924409] vfio-pci 0000:0c:00.3: not ready 8191ms after FLR; waiting
> > > [ 2443.140721] vfio-pci 0000:0c:00.3: not ready 16383ms after FLR; waiting
> > > [ 2461.571944] vfio-pci 0000:0c:00.3: not ready 32767ms after FLR; waiting
> > > [ 2496.387544] vfio-pci 0000:0c:00.3: not ready 65535ms after FLR; giving up
> > >
> > > What makes this bug especially bad is the host never recovers, and
> > > eventually hangs or crashes.
> > >
> > > For reference, the delay example you're talking about is:
> > >
> > > static int delay_250ms_after_flr(struct pci_dev *dev, int probe)
> > > {
> > > if (!pcie_has_flr(dev))
> > > return -ENOTTY;
> > >
> > > if (probe)
> > > return 0;
> > >
> > > pcie_flr(dev);
> > >
> > > msleep(250);
> > >
> > > return 0;
> > > }
> > >
> > > I don't know if it would work, but I will try it out and report back.
> > >
> > > Marcos
> > >
> > >
> >
> > Bjorn/Alex
> >
> > I have just tried the alternate approach of adding a 250ms delay to
> > the function level reset - this unfortunately results in the same
> > broken behaviour, with the host itself never recovering.
> >
> > [   76.905410] vfio-pci 0000:0d:00.3: not ready 1023ms after FLR; waiting
> > [   79.018014] vfio-pci 0000:0d:00.3: not ready 2047ms after FLR; waiting
> > [   82.089390] vfio-pci 0000:0d:00.3: not ready 4095ms after FLR; waiting
> > [   87.209416] vfio-pci 0000:0d:00.3: not ready 8191ms after FLR; waiting
> > [   96.425440] vfio-pci 0000:0d:00.3: not ready 16383ms after FLR; waiting
> > [  114.615491] vfio-pci 0000:0d:00.3: not ready 32767ms after FLR; waiting
> > [  149.417712] vfio-pci 0000:0d:00.3: not ready 65535ms after FLR; giving up
> >
> > I also tried a full second, to no avail.
> >
> > What would be the next step in proceeding with the original patch please?
>
> Implementation of FLR is "strongly recommended" by the spec but is
> optional.  So I don't see a problem with just avoiding it via your
> patch.
>
> I applied it to pci/virtualization for v5.8, thanks!
>
> Bjorn

Hi Bjorn

Wonderful news! Thanks again for your help.

Marcos



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux