On Tue, Aug 25, 2020 at 11:31:31PM +0530, Puranjay Mohan wrote: > Add a new function pci_ltr_init() which will be called from > pci_init_capabilities() to initialize every PCIe device's LTR values. > Add code in probe.c to evaluate LTR _DSM and save the latencies in pci_dev. > > Signed-off-by: Puranjay Mohan <puranjay12@xxxxxxxxx> > --- > v2 - add an #ifdefin pci-acpi.c to fix the bug reported by test bot. > v1 - based the patch on v5.9-rc1 so it applies correctly. > --- > drivers/pci/pci-acpi.c | 30 ++++++++++++++++++++++++++++++ > drivers/pci/pci.c | 27 +++++++++++++++++++++++++++ > drivers/pci/pci.h | 5 +++++ > drivers/pci/probe.c | 6 ++++++ > include/linux/pci-acpi.h | 1 + > include/linux/pci.h | 2 ++ > 6 files changed, 71 insertions(+) > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index d5869a03f748..af8297040c38 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -1213,6 +1213,36 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev, > ACPI_FREE(obj); > } > > +/* pci_acpi_evaluate_ltr_latency > + * > + * @dev - the pci_dev to evaluate and save latencies > + */ > +void pci_acpi_evaluate_ltr_latency(struct pci_dev *dev) > +{ > +#ifdef CONFIG_PCIASPM > + union acpi_object *obj, *elements; > + struct acpi_device *handle; Nit: sort declarations in order of use: handle, obj, etc. > + > + handle = ACPI_HANDLE(&dev->dev); > + if (!handle) > + return; > + > + obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, 0x2, > + DSM_PCI_LTR_MAX_LATENCY, NULL); > + if (!obj) > + return; Sorry for the delay in responding. Coincidentally, part of the reason is that I've been consumed with some PCI Firmware spec updates related to _DSM, which actually happens to affect this code. _DSM is defined in the public ACPI spec; see ACPI v6.3, sec 9.1.1. One problem with this interface is that it does not define any standard return type or value, so there is no way to tell from the return value whether the function is implemented. That means I don't think it's safe to rely on "!obj" meaning "DSM_PCI_LTR_MAX_LATENCY isn't implemented". Similarly, I don't think it's safe to assume "obj" means it *is* implemented. We have lots of places in the kernel that do this wrong. One that does it *right* is acpi_dpc_port_get(). Short story: we need to call acpi_check_dsm() first before calling acpi_evaluate_dsm(). There's an ongoing discussion about what *revision* we should use. For now I think the right thing is to use 2 as you did, since that's the first revision where DSM_PCI_LTR_MAX_LATENCY is defined. I would use "2" (decimal) like the other uses in the file, though. > + if (obj->type == ACPI_TYPE_PACKAGE && obj->package.count == 4) { > + elements = obj->package.elements; > + dev->max_snoop_latency = (u16)elements[1].integer.value | > + ((u16)elements[0].integer.value << PCI_LTR_SCALE_SHIFT); > + dev->max_nosnoop_latency = (u16)elements[3].integer.value | > + ((u16)elements[2].integer.value << PCI_LTR_SCALE_SHIFT); > + } > + ACPI_FREE(obj); > +#endif > +} > + > static void pci_acpi_set_external_facing(struct pci_dev *dev) > { > u8 val; > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index a458c46d7e39..b5531272b865 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3056,6 +3056,33 @@ void pci_pm_init(struct pci_dev *dev) > dev->imm_ready = 1; > } > > +/** > + * pci_ltr_init - Initialize Latency Tolerance Information of given PCI device > + * @dev: PCI device to handle. > + */ > +void pci_ltr_init(struct pci_dev *dev) > +{ > +#ifdef CONFIG_PCIASPM > + int ltr; > + struct pci_dev *endpoint_dev = dev; Nit: sort the declarations in order of use, i.e., endpoint_dev, ltr, ... > + u16 max_snoop_sum = 0; > + u16 max_nosnoop_sum = 0; > + > + ltr = pci_find_ext_capability(endpoint_dev, PCI_EXT_CAP_ID_LTR); > + if (!ltr) > + return; > + > + dev = pci_upstream_bridge(dev); > + while (dev) { > + max_snoop_sum += dev->max_snoop_latency; > + max_nosnoop_sum += dev->max_nosnoop_latency; dev->max_snoop_latency and dev->max_nosnoop_latency are not simple scalars, are they? Aren't they 3 bits of scale and 10 bits of value? I don't think adding these is as easy as "+=" except in the simple case when the scale is "000", i.e., "use the 10 bits of value as-is". I think we have to decode scale * latency for each device in the path, add all those up, then re-encode using the appropriate scale for the config write below. > + dev = pci_upstream_bridge(dev); > + } > + pci_write_config_word(endpoint_dev, ltr + PCI_LTR_MAX_SNOOP_LAT, max_snoop_sum); > + pci_write_config_word(endpoint_dev, ltr + PCI_LTR_MAX_NOSNOOP_LAT, max_nosnoop_sum); I think we definitely need to do this, but we need to be careful about updating things here, since I suspect mistakes will cause hard-to-debug problems. PCIe r5.0, sec 6.18, says: Setting the value and scale fields to all 0’s indicates that the device will be impacted by any delay and that the best possible service is requested. If I understand that correctly, larger values for these registers indicate that the device can tolerate more latency, so we want to err on the side of setting these fields too low rather than too high. So I think maybe we should read the current values first. If the value we computed is the same, we can just skip the write. What should we do if the value we computed is *larger* than the current value? Hot-added devices will power up with zeroes here, so I think we probably should log a note (pci_info()) and do the write. If we don't do the write and we enable ASPM, LTR, and L1 Substates, the zeroes mean the device will be requesting the best possible service, so we won't use the L1.2 substate as much as we should, resulting in more power consumption than necessary. If we read something non-zero, it probably means firmware has already configured this. If we computed something larger, we should probably still log a message, but maybe skip the write. My thinking is that this may be a firmware bug, and fixing it could reduce power usage. If we do the write, we risk breaking something that used to work. If the value we computed is *smaller* than the current value, I think we should log a note just to show that we're changing something, and we *should* do the write, because writing a smaller value should always be safe. The log messages should include both the current value and the new value we computed (decoded so they're easy to read and compare). > +#endif > +} > + > static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop) > { > unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI; > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fa12f7cbc1a0..ef3d22b82200 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -110,6 +110,7 @@ void pci_free_cap_save_buffers(struct pci_dev *dev); > bool pci_bridge_d3_possible(struct pci_dev *dev); > void pci_bridge_d3_update(struct pci_dev *dev); > void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev); > +void pci_ltr_init(struct pci_dev *dev); > > static inline void pci_wakeup_event(struct pci_dev *dev) > { > @@ -680,11 +681,15 @@ static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return -EINVAL > > #ifdef CONFIG_ACPI > int pci_acpi_program_hp_params(struct pci_dev *dev); > +void pci_acpi_evaluate_ltr_latency(struct pci_dev *dev); > #else > static inline int pci_acpi_program_hp_params(struct pci_dev *dev) > { > return -ENODEV; > } > +static inline void pci_acpi_evaluate_ltr_latency(struct pci_dev *dev) > +{ > +} Nit: all on one line as in rest of this file: static inline void pci_acpi_evaluate_ltr_latency(struct pci_dev *dev) { } > #endif > > #ifdef CONFIG_PCIEASPM > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 03d37128a24f..0257aa615665 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2140,6 +2140,11 @@ static void pci_configure_ltr(struct pci_dev *dev) > dev->ltr_path = 1; > } > #endif > + > + /* > + * Read latency values from _DSM and save in pci_dev > + */ > + pci_acpi_evaluate_ltr_latency(dev); > } > > static void pci_configure_eetlp_prefix(struct pci_dev *dev) > @@ -2400,6 +2405,7 @@ static void pci_init_capabilities(struct pci_dev *dev) > pci_ptm_init(dev); /* Precision Time Measurement */ > pci_aer_init(dev); /* Advanced Error Reporting */ > pci_dpc_init(dev); /* Downstream Port Containment */ > + pci_ltr_init(dev); /* Latency Tolerance Reporting */ > > pcie_report_downtraining(dev); > > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > index 5ba475ca9078..e23236a4ff66 100644 > --- a/include/linux/pci-acpi.h > +++ b/include/linux/pci-acpi.h > @@ -110,6 +110,7 @@ extern const guid_t pci_acpi_dsm_guid; > > /* _DSM Definitions for PCI */ > #define DSM_PCI_PRESERVE_BOOT_CONFIG 0x05 > +#define DSM_PCI_LTR_MAX_LATENCY 0x06 > #define DSM_PCI_DEVICE_NAME 0x07 > #define DSM_PCI_POWER_ON_RESET_DELAY 0x08 > #define DSM_PCI_DEVICE_READINESS_DURATIONS 0x09 > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 835530605c0d..9de6b290ed81 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -380,6 +380,8 @@ struct pci_dev { > struct pcie_link_state *link_state; /* ASPM link state */ > unsigned int ltr_path:1; /* Latency Tolerance Reporting > supported from root to here */ > + u16 max_snoop_latency; /* LTR Max Snoop latency */ > + u16 max_nosnoop_latency; /* LTR Max No Snoop latency */ Nit: "Max No-Snoop" to match spec usage. > #endif > unsigned int eetlp_prefix_path:1; /* End-to-End TLP Prefix */ > > -- > 2.27.0 >