On Thu, 17 Oct 2024, Jonathan Cameron wrote: > On Wed, 9 Oct 2024 12:52:21 +0300 > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote: > > > Currently, PCIe Link Speeds are adjusted by custom code rather than in > > a common function provided in PCI core. PCIe bandwidth controller > > (bwctrl) introduces an in-kernel API to set PCIe Link Speed. > > > > Convert Target Speed quirk to use the new API. The Target Speed quirk > > runs very early when bwctrl is not yet probed for a Port and can also > > run later when bwctrl is already setup for the Port, which requires the > > per port mutex (set_speed_mutex) to be only taken if the bwctrl setup > > is already complete. > > > > The new API is also intended to be used in an upcoming commit that adds > > a thermal cooling device to throttle PCIe bandwidth when thermal > > thresholds are reached. > > > > The PCIe bandwidth control procedure is as follows. The highest speed > > supported by the Port and the PCIe device which is not higher than the > > requested speed is selected and written into the Target Link Speed in > > the Link Control 2 Register. Then bandwidth controller retrains the > > PCIe Link. > > > > Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus > > to keep track PCIe Link Speed changes. While Bandwidth Notifications > > should also be generated when bandwidth controller alters the PCIe Link > > Speed, a few platforms do not deliver LMBS interrupt after Link > > Training as expected. Thus, after changing the Link Speed, bandwidth > > controller makes additional read for the Link Status Register to ensure > > cur_bus_speed is consistent with the new PCIe Link Speed. > > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > > Trivial stuff inline. The mutex_destroy discussion is a just a consistency > thing given that call is rarely bothered with but here it might help with > debug. > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > Jonathan > > > --- > > drivers/pci/pci.h | 20 +++++ > > drivers/pci/pcie/bwctrl.c | 161 +++++++++++++++++++++++++++++++++++++- > > drivers/pci/quirks.c | 17 +--- > > include/linux/pci.h | 10 +++ > > 4 files changed, 193 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c > > index 1b11b5da79d4..1d3680ea8e06 100644 > > --- a/drivers/pci/pcie/bwctrl.c > > +++ b/drivers/pci/pcie/bwctrl.c > > @@ -7,6 +7,11 @@ > > > static void pcie_bwnotif_enable(struct pcie_device *srv) > > { > > struct pcie_bwctrl_data *data = get_service_data(srv); > > @@ -135,6 +288,7 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) > > if (!data) > > return -ENOMEM; > > > > + mutex_init(&data->set_speed_mutex); > > set_service_data(srv, data); > > > > ret = request_threaded_irq(srv->irq, NULL, pcie_bwnotif_irq_thread, > > @@ -142,8 +296,10 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) > > if (ret) > > return ret; > > > > - port->link_bwctrl = no_free_ptr(data); > > - pcie_bwnotif_enable(srv); > > + scoped_guard(rwsem_write, &pcie_bwctrl_remove_rwsem) { > > Calling it remove_rwsem and using it to protect against not yet > present seems odd. Maybe rename, pcie_bwctrl_bound_rswem or something like that? Okay but there's one related issue I need to address: I came across recursive locking splat from pcie_bwctrl_remove_rwsem (seems I had lockdep off for some odd reason in my config so it didn't trigger during my testing until now). The splat is because of: pcie_set_target_speed() scoped_guard(rwsem_read, &pcie_bwctrl_remove_rwsem) -> pcie_bwctrl_change_speed -> pcie_retrain_link() -> pcie_reset_lbms_count() guard(rwsem_read)(&pcie_bwctrl_remove_rwsem); I wouldn't want to special case pcie_retrain_link() just for BW control's sake, I plan to solve that by splitting pcie_bwctrl_remove_rwsem to have one for LBMS API and another for set speed API so split it into: - pcie_bwctrl_setspeed_rwsem - pcie_bwctrl_lbms_rwsem It requires taking both in .probe and .remove which is not as elegant I'd want but definitely seems more elegant compared with differentiating pcie_retrain_link(). If you, for some reason would instead prefer I differentiated pcie_retrain_link() for cases when holding pcie_bwctrl_remove_rwsem already (or pcie_bwctrl_bound_rwsem as suggested by you), please let me know. > > + port->link_bwctrl = no_free_ptr(data); > > + pcie_bwnotif_enable(srv); > > + } > > > > pci_dbg(port, "enabled with IRQ %d\n", srv->irq); > > > > @@ -159,6 +315,7 @@ static void pcie_bwnotif_remove(struct pcie_device *srv) > > srv->port->link_bwctrl = NULL; > > > > free_irq(srv->irq, srv); > > + mutex_destroy(&data->set_speed_mutex); > > Probably not worth doing. Also you don't do error handling for this above. I'm not sure what error handling you're referring to? > Ideal is use devm_ for data and then devm_mutex_init() I've now converted alloc, irq, and mutex to devm. I had them at one point but then Lukas said it's the wrong direction consider what lies in future but since the future seeming has now changed I've no problem of making our lives easier with devm. :-) I also got rid of the service data use for your convinience. -- i.