Re: [PATCH V4 1/2] PCI/portdrv: Fix MSI/MSI-X bug for PCIe port service drivers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn

在 2017/10/19 6:33, Bjorn Helgaas 写道:
On Wed, Oct 18, 2017 at 06:50:05PM +0800, Dongdong Liu wrote:
Hi Bjorn
在 2017/10/18 4:25, Bjorn Helgaas 写道:
On Wed, Oct 11, 2017 at 06:52:57PM +0800, Dongdong Liu wrote:
After removing and adding back the PCI root port device,
we see the PCIe port service drivers request irq failed.
pcie_pme: probe of 0000:00:00.0:pcie001 failed with error -22
aer: probe of 0000:00:00.0:pcie002 failed with error -22
pciehp 0000:00:00.0:pcie004: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd-
PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
pciehp 0000:00:00.0:pcie004: Cannot get irq 20 for the hotplug controller
pciehp 0000:00:00.0:pcie004: Notification initialization failed (-1)
dpc 0000:00:00.0:pcie010: request IRQ22 failed: -22
dpc: probe of 0000:00:00.0:pcie010 failed with error -22

The current code basically does this:

 - allocate 32 vectors
 - figure out vector used by PME and hotplug
 - figure out vector used by AER
 - figure out vector used by DPC
 - free the 32 vectors we allocated
 - allocate only as many vectors as we need

but it is broken as calling pci_free_irq_vectors()
invalidates the IRQ numbers returned before by pci_irq_vectors();

The hardware works:
 - PME and hotplug use the Interrupt Message Number from the PCIe
   Capability register.

 - AER uses the AER Interrupt Message Number from the AER Root Error
   Status register.

 - DPC uses the DPC Interrupt Message Number from the DPC Capability
   register.

 - FRS (not supported by Linux yet) uses the FRS Interrupt Message
   Number from the FRS Queuing Capability register.

 - That's a total of 4 possible MSI/MSI-X vectors used for PME,
   hotplug, AER, DPC, and FRS, so there's no point in trying to
   allocate more than 4 vectors (we currently start with 32).

 - All these Interrupt Message Numbers are read-only to software but
   are updated by hardware when software writes the Multiple Message
   Enable field of the MSI Message Control register.  Thanks for
   pointing this out; I didn't realize this before.

 - Therefore, we must read the Interrupt Message Numbers *after* we
   write Multiple Message Enable.

Since we can use at most 4 vectors, this patch is to optimize the
interrupt vector allocation strategy in pcie_port_enable_irq_vec().
First figure out how many vectors we need,allocate them, and then fill
in the slots in the irqs[] table.

Cc: <stable@xxxxxxxxxxxxxxx>
Fixes: 3674cc49da9a ("PCI/portdrv: Use pci_irq_alloc_vectors()")
Suggested-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Signed-off-by: Dongdong Liu <liudongdong3@xxxxxxxxxx>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@xxxxxxxxxx>

Thanks a lot for all your work on this.  This was a confusing issue
with code to match.  I tried to simplify the changelog as below;
please correct anything I messed up.

A few hopefully last questions:

 - Your subject suggested that this was a bug for both MSI and MSI-X,
   but I think it only affected MSI, because the spec says MSI-X
   Interrupt Message Numbers must remain constant.  What do you
   think?

Yes, for a given MSI-X implementation, the entry must remain constant.
But in fact, the bug PCIe port service drivers request irq failed does
not relate with this. As we said in PATCH V2,
The current code does:

  pci_alloc_irq_vectors(dev, ...)
  irqs[x] = pci_irq_vector(dev, entry)
  pci_free_irq_vectors(dev)
  pci_alloc_irq_vectors(dev, ...)

The problem is that the second pci_alloc_irq_vectors() call gets different
vectors than the first one. For MSI-X, It maybe also have such bug.

OK, I'm not sure I understand yet how this works with MSI-X, but I
suspect the current patch still breaks MSI-X.  Let me see if I can
make sense of what happens when the root port supports MSI-X:

  - The MSI-X Table Size (in Message Control) is fixed; this field is
    read-only in hardware.

  - When MSI-X is enabled, the Interrupt Message Numbers for PME/HP,
    AER, DPC, etc., are fixed, contain an MSI-X Table index, and must
    be less than 32.

  - pci_alloc_irq_vectors(3) will allocate 3 MSI-X vectors (0-2).
    There's no reason to assume the Interrupt Message Numbers (which
    can be 0-31) will be in that range.

Here's the flow through this path for MSI-X with the current patch
applied, with all services enabled.  You can see that in
msix_setup_entries(), we'll allocate 3 entries and they will use
indexes 0-2 in the MSI-X table.  The Interrupt Message Numbers index
that table, and they can be in the range 0-31.

  pcie_init_service_irqs
    pcie_port_enable_irq_vec
      nvec = 3            # PME/HP, AER, DPC
      pci_alloc_irq_vectors(..., 1, 3, PCI_IRQ_MSIX | PCI_IRQ_MSI)
        pci_alloc_irq_vectors_affinity(..., 1, 3, PCI_IRQ_MSIX | PCI_IRQ_MSI)
          __pci_enable_msix_range(entries==NULL, 1, 3)
            __pci_enable_msix(entries==NULL, nvec=3)
              msix_capability_init(entries==NULL, nvec=3)
                base = msix_map_region(dev, msix_table_size(control))
                msix_setup_entries(base, entries==NULL, nvec=3)
                  for (i = 0; i < nvec; i++)
                    entry = alloc_msi_entry()
                    entry->msi_attrib.entry_nr = i  # <-- MSI-X table index
                    entry->mask_base = base         # MSI-X table base
                msix_program_entries
                  for_each_pci_msi_entry()
                    writel(PCI_MSIX_ENTRY_CTRL_MASKBIT,
                           pci_msix_desc_addr(entry) +
                           PCI_MSIX_ENTRY_VECTOR_CTRL)

        nr_entries = 3    # returned from pci_alloc_irq_vectors()
        # read message numbers:
        entry = <PME/HP, AER, or DPC message number>
        # assume entry == 4
        if (entry > nr_entries)
          goto out_free_irqs   # we should take this error path

Yes, the above analysis is correct.

I would expect pcie_port_enable_irq_vec() to fail on many root ports
with MSI-X, because one of PME/HP, AER, DPC may use an Interrupt
Message Number greater than 3.

The previous code (before this patch) first allocates 32 MSI-X
vectors, then reads all the Interrupt Message Numbers, then allocates
just enough MSI-X vectors so all the message numbers are valid table
indexes.

In the current patch, we don't know the message numbers until after
we've already enabled MSI-X with some number of vectors.

So for MSI, we have the constraint that we have to enable MSI before
reading the Message Numbers.  For MSI-X, we can't know how many
vectors to request until after we read the Message Numbers.  I wonder
if we need to split these two paths?

If we worry about current patch will break MSI-X.
How about [PATCH V3] PCI/portdrv: Fix MSI/MSI-X bug for PCIe port service drivers.
https://patchwork.kernel.org/patch/9992827/

I still think patch v3 will be ok for MSI/MSI-X.
----------------------------------------------
The patch v3 below contains this sequence:

  pci_alloc_irq_vectors(32)
    ...
     write Multiple Message Enable   <-- Interrupt Message Numbers change
  idx[x] = PME/hotplug Interrupt Message Number
  idx[y] = AER Interrupt Message Number
  idx[z] = DPC Interrupt Message Number
  pci_free_irq_vectors()
  pci_alloc_irq_vectors()
    ...
     write Multiple Message Enable   <-- Interrupt Message Numbers change
  for (i = 0; ...)
    irqs[i] = pci_irq_vector(idx[i])

This incorrectly assumes the Interrupt Message Numbers remain the same
after we write Multiple Message Enable.
--------------------------------------------
We just worry about Interrupt Message Numbers maybe are not the same.
pci_alloc_irq_vectors(32)
 write Multiple Message Enable=5(32 vectors)   <-- Interrupt Message Numbers change
 idx[x] = PME/hotplug Interrupt Message Number, assume 0
 idx[y] = AER Interrupt Message Number,         assume 2
 idx[z] = DPC Interrupt Message Number,         assume 6
 pci_free_irq_vectors()
 pci_alloc_irq_vectors(7)
    ...
     write Multiple Message Enable=3(8 vectors)
  for (i = 0; ...)
    irqs[i] = pci_irq_vector(idx[i])

As Multiple Message Enable is 8, Hardware must remain the same as before
(HiSilicon hip08 design as this).Test on HiSilicon hip08 platform, it works ok.

Hardware is required to update this field so that it is correct if the number of
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MSI Messages assigned to the Function changes when software writes to the Multiple
Message Enable field in the MSI Message Control register

The PCIe spec is not very clear about this make us confused. We worry about the
hardwares are not consistent.

Split these two paths means msi/msi-x uses different algorithm.
Currently, it maybe a good choice.

Thanks,
Dongdong

 - I think the problem was probably exposed by a1d5f18cafe6
   ("PCI/portdrv: Support multiple interrupts for MSI as well as
   MSI-X"), not 3674cc49da9a, because prior to a1d5f18cafe6, we only
   used a single MSI vector, and we didn't do the "allocate vectors,
   see which ones were used, reallocate only the required number of
   vectors" dance.

Yes, 3674cc49da9a, for MSI, it should be work ok as it just use used a
single MSI vector(not get the irq and free and alloc irq agian as MSI-X).
But for MSI-X, I think it maybe still have a bug.

   It looks like we only allocated a single vector, so all the
   Interrupt Message Numbers should have been zero.  Or did you
   actually bisect it to 3674cc49da9a?

I think 3674cc49da9a is beeter as a1d5f18cafe6 maybe still have the bug for MSI-X.
a1d5f18cafe6 PCI/portdrv: Support multiple interrupts for MSI as well as MSI-X
3674cc49da9a PCI/portdrv: Use pci_irq_alloc_vectors().


Bjorn


commit e402a536cf7122cda6b3bef420de36c22d2907e6
Author: Dongdong Liu <liudongdong3@xxxxxxxxxx>
Date:   Wed Oct 11 18:52:57 2017 +0800

   PCI/portdrv: Fix MSI multiple interrupt setup

   When removing and adding back a PCIe Root Port device that uses MSI (not
   MSI-X), the port service driver IRQ request fails:

     pcie_pme: probe of 0000:00:00.0:pcie001 failed with error -22
     aer: probe of 0000:00:00.0:pcie002 failed with error -22
     pciehp 0000:00:00.0:pcie004: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+
     pciehp 0000:00:00.0:pcie004: Cannot get irq 20 for the hotplug controller
     pciehp 0000:00:00.0:pcie004: Notification initialization failed (-1)
     dpc 0000:00:00.0:pcie010: request IRQ22 failed: -22
     dpc: probe of 0000:00:00.0:pcie010 failed with error -22

   The previous code basically did this:

     - Write MSI Multiple Message Enable to enable 32 vectors
     - Read Interrupt Message Numbers for PME & hotplug, AER, and DPC
     - Rewrite MSI Multiple Message Enable to enable only as many vectors as
       we think we need
     - Assume the Interrupt Message Numbers are still valid

   The problem is that when we change the Multiple Message Enable field to
   what we think is the minimal number of vectors required, the hardware may
   change which vectors it uses.  See the "Interrupt Message Number"
   discussion in PCIe r3.1, sec 7.8.2 (PME & hotplug), 7.10.10 (AER), 7.31.2
   (DPC).

   Read the Interrupt Message Numbers *after* we write the Multiple Message
   Enable field so we use the correct IRQ vectors.

   Note that we can't predict how hardware will select Interrupt Message
   Numbers, so even if we allocate as many vectors as there are interrupt
   sources, the hardware may still select Message Numbers that result in
   sharing a vector.  E.g., on HiSilicon hip08, if we have 3 sources
   (PME/hotplug, AER, DPC) and 4 MSI vectors, hardware selects Message
   Numbers 0 for PME/hotplug and 2 for AER and DPC.  That leaves vectors 1
   and 3 unused, while AER and DPC share vector 2.

   Even though MSI-X doesn't have the issue with Interrupt Message Numbers
   changing, this patch may result in more IRQ sharing because we no longer
   read those Message Numbers and use them to determine how many MSI-X vectors
   to request.

   Fixes: a1d5f18cafe6 ("PCI/portdrv: Support multiple interrupts for MSI as well as MSI-X")
   Suggested-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
   Signed-off-by: Dongdong Liu <liudongdong3@xxxxxxxxxx>
   Signed-off-by: Gabriele Paoloni <gabriele.paoloni@xxxxxxxxxx>
   [bhelgaas: changelog]
   Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
   Cc: stable@xxxxxxxxxxxxxxx      # v4.13+


The commit message looks good to me.
I think Fixes: 3674cc49da9a ("PCI/portdrv: Use pci_irq_alloc_vectors()") is better as
it still has a bug for MSI-X.

Thanks,
Dongdong
.



.





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]