Jonathan Cameron wrote: > On Tue, 10 Dec 2024 13:08:21 +0900 > "Kobayashi,Daisuke" <kobayashi.da-06@xxxxxxxxxxx> wrote: > > > Add support for PBEC (Power Budgeting Extended Capability) output to > > the PCIe driver. PBEC is defined in the PCIe specification(PCIe r6.0, > > sec 7.8.1) and is a standard method for obtaining device power > > consumption information. > > I'm curious why you chose to drive the interface from sysfs as opposed to just > querying how many there are boot and providing separate files for each data > register, potentially a set of files for each one for better readability. > > Slightly irritating is that there is no 'count' register so you'd have to walk up or > bisect just to find out how many there actually readable. > Then use is_visible() to hide the remainder. > > Look like it is an 8 bit data select register, so maybe 256 max? > That would give you things like > power_budget0_power > power_budget0_pm_state > etc > Or maybe drop the leading power given where we are so budget0_power etc > > Mailboxes via sysfs are always a bit messy as there isn't really any locking etc. > > Sorry I'm coming in rather late with this query given you are already at v5! > > Jonathan > Thank you for your valuable feedback and for taking the time to review my patch. Regarding your proposed file structure: budget0_power budget0_pm_state ... budget1_power budget1_pm_state ... budget2_xxx ... Is this the correct format? In my understanding, implementing this would require 256 attributes corresponding to the DataSelectRegister, which seems overly redundant. Additionally, I was concerned about the mechanism of changing the DataSelectRegister each time a file is read. Therefore, I proposed this patch implementation, even though it might not be ideal. If you have a suitable implementation to achieve your proposed structure, I would greatly appreciate it if you could share it. Upon further reflection, the issue of needing to change the DataSelectRegister could potentially be avoided by initially retrieving and storing the register values for all indices. Thank you for pointing out the coding style issue, I'll fix it in the next patch. > > > > Signed-off-by: "Kobayashi,Daisuke" <kobayashi.da-06@xxxxxxxxxxx> > > --- > > Documentation/ABI/testing/sysfs-bus-pci | 62 ++++++++ > > drivers/pci/pci-sysfs.c | 179 > ++++++++++++++++++++++++ > > 2 files changed, 241 insertions(+) > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci > > b/Documentation/ABI/testing/sysfs-bus-pci > > index 7f63c7e97773..ec417ae20bc1 100644 > > --- a/Documentation/ABI/testing/sysfs-bus-pci > > +++ b/Documentation/ABI/testing/sysfs-bus-pci > > @@ -572,3 +572,65 @@ Description: > > enclosure-specific indications "specific0" to "specific7", > > hence the corresponding led class devices are unavailable if > > the DSM interface is used. > > + > > +What: /sys/bus/pci/devices/.../power_budget > > +Date: December 2024 > > +Contact: Kobayashi Daisuke <kobayashi.da-06@xxxxxxxxxxx> > > +Description: > > + This file provides information about the PCIe power budget > > + for the device. It is a read-only file that outputs the values > > + of the Power Budgeting Data Register for each power state as > a > > + series of 32-bit hexadecimal values. Each line represents a > > + single Power Budgeting Data register entry, containing the > > + power budget for a specific power state. > > + > > + The specific interpretation of these values depends on the > > + device and the PCIe specification. Refer to the PCIe > > + specification for detailed information about the Power > > + Budgeting Data register, including the encoding of power > > + states and the interpretation of Base Power and Data Scale. > > + > > +What: > /sys/bus/pci/devices/.../power_budget/power_budget_data_select > > +Date: December 2024 > > +Contact: Kobayashi Daisuke <kobayashi.da-06@xxxxxxxxxxx> > > +Description: > > + This is an 8-bit read/write register that selects the DWORD of > > + power budgeting data that will be displayed in the > > + Power Budgeting Data. The value starts at zero and > incrementing > > + the index value selects the next DWORD. > > + > > +What: > /sys/bus/pci/devices/.../power_budget/power_budget_power > > +Date: December 2024 > > +Contact: Kobayashi Daisuke <kobayashi.da-06@xxxxxxxxxxx> > > +Description: > > + This file provides the power consumption calculated by > > + multiplying the base power by the data scale. > > + > > + > > +What: > /sys/bus/pci/devices/.../power_budget/power_budget_pm_state > > +Date: December 2024 > > +Contact: Kobayashi Daisuke <kobayashi.da-06@xxxxxxxxxxx> > > +Description: > > + This file specifies the power management state of the > operating > > + condition. > > + > > +What: > /sys/bus/pci/devices/.../power_budget/power_budget_pm_substat > e > > +Date: December 2024 > > +Contact: Kobayashi Daisuke <kobayashi.da-06@xxxxxxxxxxx> > > +Description: > > + This file specifies the power management sub state of the > > + operating condition. > > + > > +What: > /sys/bus/pci/devices/.../power_budget/power_budget_type > > +Date: December 2024 > > +Contact: Kobayashi Daisuke <kobayashi.da-06@xxxxxxxxxxx> > > +Description: > > + This file specifies the type of the operating condition. > > + > > + > > +What: > /sys/bus/pci/devices/.../power_budget/power_budget_rail > > +Date: December 2024 > > +Contact: Kobayashi Daisuke <kobayashi.da-06@xxxxxxxxxxx> > > +Description: > > + This file Specifies the thermal load or power rail of the > > + operating condition. > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index > > 5d0f4db1cab7..89909633ad02 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -238,6 +238,155 @@ static ssize_t current_link_width_show(struct > > device *dev, } static DEVICE_ATTR_RO(current_link_width); > > > + > > +static ssize_t power_budget_rail_show(struct device *dev, > > + struct device_attribute *attr, char > *buf) { > > + struct pci_dev *pci_dev = to_pci_dev(dev); > > + int pos, err; > > + u32 data; > > + > > + pos = pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_PWR); > > IIRC, this is not a cheap operation. I'd suggest storing it like we do for aer_cap > etc. > > > + if (!pos) > > + return -EINVAL; > > + > > + err = pci_read_config_dword(pci_dev, pos + PCI_PWR_DATA, &data); > > + if (err) > > + return -EINVAL; > > + > > + return sysfs_emit(buf, "%s\n", > > + > pci_power_budget_rail_string(PCI_PWR_DATA_RAIL(data))); > > Unusual indenting. Align it below b > > > +} > > +static DEVICE_ATTR_RO(power_budget_rail); > > + > > static ssize_t secondary_bus_number_show(struct device *dev, > > struct device_attribute *attr, > > char *buf) > > @@ -636,6 +785,16 @@ static struct attribute *pcie_dev_attrs[] = { > > NULL, > > }; > > > > +static struct attribute *pcie_pbec_attrs[] = { > > + &dev_attr_power_budget_data_select.attr, > > + &dev_attr_power_budget_power.attr, > > + &dev_attr_power_budget_pm_state.attr, > > + &dev_attr_power_budget_pm_substate.attr, > > + &dev_attr_power_budget_rail.attr, > > + &dev_attr_power_budget_type.attr, > > + NULL, > No need for comma on terminating entries as we will never add anything after > them. > > > +};