On Sat, 30 Dec 2023, Lukas Wunner wrote: > On Fri, Sep 29, 2023 at 02:57:20PM +0300, Ilpo Järvinen wrote: > > This mostly reverts b4c7d2076b4e ("PCI/LINK: Remove bandwidth > > notification"), however, there are small tweaks: > > > > 1) Call it PCIe bwctrl (bandwidth controller) instead of just > > bandwidth notifications. > > 2) Don't print the notifications into kernel log, just keep the current > > link speed updated. > > 3) Use concurrency safe LNKCTL RMW operations. > > 4) Read link speed after enabling the notification to ensure the > > current link speed is correct from the start. > > 5) Add local variable in probe for srv->port. > > 6) Handle link speed read and LBMS write race in > > pcie_bw_notification_irq(). > > > > The reason for 1) is to indicate the increased scope of the driver. A > > subsequent commit extends the driver to allow controlling PCIe > > bandwidths from user space upon crossing thermal thresholds. > > > > While 2) is somewhat unfortunate, the log spam was the source of > > complaints that eventually lead to the removal of the bandwidth > > notifications driver (see the links below for further information). > > After re-adding this driver back the userspace can, if it wishes to, > > observe the link speed changes using the current bus speed files under > > sysfs. > > Good commit message. > > > --- /dev/null > > +++ b/drivers/pci/pcie/bwctrl.c > > > +static void pcie_enable_link_bandwidth_notification(struct pci_dev *dev) > > +{ > > + u16 link_status; > > + int ret; > > + > > + pcie_capability_write_word(dev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS); > > + pcie_capability_set_word(dev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LBMIE); > > I'm wondering why we're not enabling LABIE as well? > (And clear LABS.) > > Can't it happen that we miss bandwidth changes unless we enable that > as well? Thanks. Reading the spec, it sounds like both are necessary to not miss changes. > > +static int pcie_bandwidth_notification_probe(struct pcie_device *srv) > > +{ > > + struct pci_dev *port = srv->port; > > + int ret; > > + > > + /* Single-width or single-speed ports do not have to support this. */ > > + if (!pcie_link_bandwidth_notification_supported(port)) > > + return -ENODEV; > > I'm wondering if this should be checked in get_port_device_capability() > instead? I can move the check there. > > + ret = request_irq(srv->irq, pcie_bw_notification_irq, > > + IRQF_SHARED, "PCIe BW ctrl", srv); > > Is there a reason to run the IRQ handler in hardirq context > or would it work to run it in an IRQ thread? Usually on systems > than enable PREEMPT_RT, a threaded IRQ handler is preferred, > so unless hardirq context is necessary, I'd recommend using > an IRQ thread. Can I somehow postpone the decision between IRQ_NONE / IRQ_HANDLED straight into the thread_fn? One LNKSTA read is necessary to decide that. I suppose the other write + reread of LNKSTA could be moved into thread_fn even if the first read would not be movable. -- i.