On Wed, 8 May 2024 16:47:41 +0300 Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote: > This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove > bandwidth notification"). An upcoming commit extends this driver > building PCIe bandwidth controller on top of it. > > The PCIe bandwidth notification were first added in the commit > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth > notification") but later had to be removed. The significant changes > compared with the old bandwidth notification driver include: > > 1) Don't print the notifications into kernel log, just keep the Link > Speed cached into the struct pci_bus updated. While somewhat > unfortunate, the log spam was the source of complaints that > eventually lead to the removal of the bandwidth notifications driver > (see the links below for further information). > > 2) Besides the Link Bandwidth Management Interrupt, enable also Link > Autonomous Bandwidth Interrupt to cover the other source of > bandwidth changes. > > 3) Use threaded IRQ with IRQF_ONESHOT to handle Bandwidth Notification > Interrupts to address the problem fixed in the commit 3e82a7f9031f > ("PCI/LINK: Supply IRQ handler so level-triggered IRQs are acked")). > > 4) Handle Link Speed updates robustly. Refresh the cached Link Speed > when enabling Bandwidth Notification Interrupts, and solve the race > between Link Speed read and LBMS/LABS update in > pcie_bwnotif_irq_thread(). > > 5) Use concurrency safe LNKCTL RMW operations. > > 6) The driver is now called PCIe bwctrl (bandwidth controller) instead > of just bandwidth notifications because of increased scope and > functionality within the driver. > > 7) Coexist with the Target Link Speed quirk in > pcie_failed_link_retrain(). Provide LBMS counting API for it. > > 8) Tweaks to variable/functions names for consistency and length > reasons. > > Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus > to keep track PCIe Link Speed changes. > > Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@xxxxxxxxxx/ > Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@xxxxxxxxx/ > Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@xxxxxxxxxx/ > Suggested-by: Lukas Wunner <lukas@xxxxxxxxx> # Building bwctrl on top of bwnotif > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> A few trivial things inline. Either way LGTM Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile > index 6461aa93fe76..6357bc219632 100644 > --- a/drivers/pci/pcie/Makefile > +++ b/drivers/pci/pcie/Makefile > @@ -12,4 +12,5 @@ obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o > obj-$(CONFIG_PCIE_PME) += pme.o > obj-$(CONFIG_PCIE_DPC) += dpc.o > obj-$(CONFIG_PCIE_PTM) += ptm.o > +obj-$(CONFIG_PCIE_BWCTRL) += bwctrl.o > obj-$(CONFIG_PCIE_EDR) += edr.o > diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c > new file mode 100644 > index 000000000000..5afc533dd0a9 > --- /dev/null > +++ b/drivers/pci/pcie/bwctrl.c > @@ -0,0 +1,185 @@ > + > +static irqreturn_t pcie_bwnotif_irq_thread(int irq, void *context) > +{ > + struct pcie_device *srv = context; > + struct pcie_bwctrl_data *data = get_service_data(srv); > + struct pci_dev *port = srv->port; > + u16 link_status, events; > + int ret; > + > + ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status); > + events = link_status & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_LABS); > + > + if (ret != PCIBIOS_SUCCESSFUL || !events) > + return IRQ_NONE; Trivial, but nicer to not use link_status if it is garbage (even briefly) Only a couple of lines more to keep it clean. ret = pcie... if (ret != PCI_BIOS_SUCCESSFUL) return IRQ_NONE; events = ... if (!events) return IRQ_NONE; > + > + if (events & PCI_EXP_LNKSTA_LBMS) > + atomic_inc(&data->lbms_count); > + > + pcie_capability_write_word(port, PCI_EXP_LNKSTA, events); > + > + /* > + * Interrupts will not be triggered from any further Link Speed > + * change until LBMS is cleared by the write. Therefore, re-read the > + * speed (inside pcie_update_link_speed()) after LBMS has been > + * cleared to avoid missing link speed changes. > + */ > + pcie_update_link_speed(port->subordinate); > + > + return IRQ_HANDLED; > +} > + > +static int pcie_bwnotif_probe(struct pcie_device *srv) > +{ > + struct pci_dev *port = srv->port; > + int ret; > + > + struct pcie_bwctrl_data *data __free(kfree) = > + kzalloc(sizeof(*data), GFP_KERNEL); > + if (!data) > + return -ENOMEM; > + > + set_service_data(srv, data); > + > + ret = request_threaded_irq(srv->irq, NULL, pcie_bwnotif_irq_thread, > + IRQF_SHARED | IRQF_ONESHOT, "PCIe bwctrl", srv); > + if (ret) > + return ret; > + > + port->link_bwctrl = no_free_ptr(data); > + pcie_bwnotif_enable(srv); > + pci_info(port, "enabled with IRQ %d\n", srv->irq); Rather noisy given this is easy enough to establish via other paths. pci_dbg() maybe? > + > + return 0; > +} > + > +static void pcie_bwnotif_remove(struct pcie_device *srv) > +{ > + struct pcie_bwctrl_data *data = get_service_data(srv); > + > + scoped_guard(rwsem_write, &pcie_bwctrl_remove_rwsem) > + srv->port->link_bwctrl = NULL; > + > + pcie_bwnotif_disable(srv->port); Trivial but I'd like a comment to say why this needs to be done after the link_bwctrl = NULL above (or if not, move it before that. That puts the tear down slightly out of order vs set up. > + free_irq(srv->irq, srv); > + kfree(data); > +} > +