Re: Kernel 5.9 IOMMU groups regression/change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 28, 2020 at 3:07 PM Rajat Jain <rajatja@xxxxxxxxxx> wrote:
>
> On Wed, Oct 28, 2020 at 2:52 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > [+cc Rajat, LKML]
> >
>
> Thanks for copying me. (I don't look at mailing lists actively - so
> missed this). Taking a look at this now.
>
> Thanks,
>
> Rajat
>
>
> > On Tue, Oct 27, 2020 at 08:31:09PM +0100, Boris V. wrote:
> > > On 25/10/2020 20:45, Boris V. wrote:
> > > > With upgrade to kernel 5.9 my VMs stopped working, because some devices
> > > > can't be passed through.
> > > > This is caused by different IOMMU groups and devices being in the same
> > > > group.
> > > >
> > > > For ex. with kernel 5.8 this are IOMMU groups:
> > > > IOMMU Group 40:
> > > >         08:01.0 PCI bridge [0604]: ASMedia Technology Inc. Device
> > > > [1b21:118f]
> > > >         09:00.0 Ethernet controller [0200]: Intel Corporation I211
> > > > Gigabit Network Connection [8086:1539] (rev 03)
> > > > IOMMU Group 43:
> > > >         0c:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062
> > > > Serial ATA Controller [1b21:0612] (rev 02)
> > > > IOMMU Group 44:
> > > >         0d:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042A
> > > > USB 3.0 Host Controller [1b21:1142]
> > > >
> > > > Ethernet, SATA and USB controller in its own group.
> > > >
> > > > And with 5.9, everything is in one group:
> > > > IOMMU Group 29:
> > > >         00:1c.0 PCI bridge [0604]: Intel Corporation C610/X99 series
> > > > chipset PCI Express Root Port #1 [8086:8d10] (rev d5)
> > > >         00:1c.3 PCI bridge [0604]: Intel Corporation C610/X99 series
> > > > chipset PCI Express Root Port #4 [8086:8d16] (rev d5)
> > > >         00:1c.4 PCI bridge [0604]: Intel Corporation C610/X99 series
> > > > chipset PCI Express Root Port #5 [8086:8d18] (rev d5)
> > > >         00:1c.6 PCI bridge [0604]: Intel Corporation C610/X99 series
> > > > chipset PCI Express Root Port #7 [8086:8d1c] (rev d5)
> > > >         07:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device
> > > > [1b21:118f]
> > > >         08:01.0 PCI bridge [0604]: ASMedia Technology Inc. Device
> > > > [1b21:118f]
> > > >         08:03.0 PCI bridge [0604]: ASMedia Technology Inc. Device
> > > > [1b21:118f]
> > > >         08:04.0 PCI bridge [0604]: ASMedia Technology Inc. Device
> > > > [1b21:118f]
> > > >         09:00.0 Ethernet controller [0200]: Intel Corporation I211
> > > > Gigabit Network Connection [8086:1539] (rev 03)
> > > >         0c:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062
> > > > Serial ATA Controller [1b21:0612] (rev 02)
> > > >         0d:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042A
> > > > USB 3.0 Host Controller [1b21:1142]
> > > >
> > > >
> > > > This seems to be caused by commit
> > > > 52fbf5bdeeef415b28b8e6cdade1e48927927f60.
> > > > commit 52fbf5bdeeef415b28b8e6cdade1e48927927f60
> > > > Author: Rajat Jain <rajatja@xxxxxxxxxx>
> > > > Date:   Tue Jul 7 15:46:02 2020 -0700
> > > >
> > > >     PCI: Cache ACS capability offset in device
> > > >
> > > >     Currently the ACS capability is being looked up at a number of
> > > > places. Read
> > > >     and store it once at enumeration so that it can be used by all
> > > > later.  No
> > > >     functional change intended.
> > > >
> > > >     Link:
> > > > https://lore.kernel.org/r/20200707224604.3737893-2-rajatja@xxxxxxxxxx
> > > >     Signed-off-by: Rajat Jain <rajatja@xxxxxxxxxx>
> > > >     Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > > >
> > > >  drivers/pci/p2pdma.c |  2 +-
> > > >  drivers/pci/pci.c    | 20 ++++++++++++++++----
> > > >  drivers/pci/pci.h    |  2 +-
> > > >  drivers/pci/probe.c  |  2 +-
> > > >  drivers/pci/quirks.c |  8 ++++----
> > > >  include/linux/pci.h  |  1 +
> > > >  6 files changed, 24 insertions(+), 11 deletions(-)
> > > >
> > > >
> > > > If I revert this commit, I get back old groups.
> > > >
> > > > In commit log there is message 'No functional change intended'. But
> > > > there is functional change.
> > > >
> > > > This is Intel Core i7-5930K CPU and X99 chipset. But I see the same
> > > > thing on other Intel systems (didn't test on AMD).
> > >
> > > Some more info.
> > > Problem seems to be that pci_dev_specific_enable_acs() is not called
> > > anymore.
> > > Before, pci_enable_acs() was called from pci_init_capabilities() and in
> > > pci_enable_acs(), pci_dev_specific_enable_acs() was called.
> > > I don't know anything about PCI and this stuff, but I'm guessing that this
> > > function enable ACS for some Intel devices.
> > > But after this commit, pci_acs_init() is called from pci_init_capabilities()
> > > and if pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS) returns 0,
> > > pci_enable_acs() and pci_dev_specific_enable_acs() is not called anymore.
> > > If I apply for ex. this patch bellow, groups are right again and everything
> > > works as before.
> >
> > Thanks very much for the report and the debugging.  Maybe we can get
> > this sorted and fixed for v5.10-rc2 or -rc3.

Thank Boris for reporting and debugging! The problem was because I
overlooked the fact that some rootports (the ones quirked with
*_intel_pch_acs_* functions in this case) may not expose a standard
ACS capability structure, but rather depend on quirks to enable ACS
for them using non standard registers. Your platform is in this
category. Can you please send lspci -vvvv and lspci -xxxx for one of
your rootports to confirm?

> >
> > > diff -ur linux-5.9.1.orig/drivers/pci/pci.c linux-5.9.1/drivers/pci/pci.c
> > > --- linux-5.9.1.orig/drivers/pci/pci.c  2020-10-17 08:31:22.000000000 +0200
> > > +++ linux-5.9.1/drivers/pci/pci.c       2020-10-27 19:01:32.650010803 +0100
> > > @@ -3502,9 +3502,7 @@
> > >  void pci_acs_init(struct pci_dev *dev)
> > >  {
> > >         dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> > > -
> > > -       if (dev->acs_cap)
> > > -               pci_enable_acs(dev);
> > > +       pci_enable_acs(dev);
> > >  }
> > >
> > >  /**
> > >

Ack, yes, this is what needs to be done, and I just sent a patch at
https://lkml.org/lkml/2020/10/28/693.

Thanks,

Rajat

> > >



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux