Re: [PATCH 6/8] PCI/AER: Add AER sysfs attributes for ratelimits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 14 Jan 2025 23:42:58 -0800
Jon Pan-Doh <pandoh@xxxxxxxxxx> wrote:

> Allow userspace to read/write ratelimits per device. Create aer/ sysfs
> directory to store them and any future aer configs.
> 
> Tested using aer-inject[1] tool. Configured correctable log/irq ratelimits
> to 5/10 respectively. Sent 12 AER errors. Observed 5 errors logged while
> AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 11
> (1 masked).
> 
> Before: CEMsk: BadTLP-
> After:  CEMsk: BadTLP+.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
> 
> Signed-off-by: Jon Pan-Doh <pandoh@xxxxxxxxxx>
> ---
>  .../testing/sysfs-bus-pci-devices-aer_stats   | 32 +++++++++++
>  Documentation/PCI/pcieaer-howto.rst           |  4 +-
>  drivers/pci/pci-sysfs.c                       |  1 +
>  drivers/pci/pci.h                             |  1 +
>  drivers/pci/pcie/aer.c                        | 56 +++++++++++++++++++
>  5 files changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> index d1f67bb81d5d..c680a53af0f4 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> @@ -117,3 +117,35 @@ Date:		July 2018
>  KernelVersion:	4.19.0
>  Contact:	linux-pci@xxxxxxxxxxxxxxx, rajatja@xxxxxxxxxx
>  Description:	Total number of ERR_NONFATAL messages reported to rootport.
> +
> +PCIe AER ratelimits
> +----------------------------

Purely from an rst formatting point of view (this gets picked up in the docs
build) --- should extend I think only under the text.

> +
> +These attributes show up under all the devices that are AER capable. Provides
> +configurable ratelimits of logs/irq per error type. Writing a nonzero value
> +changes the number of errors (burst) allowed per 5 second window before
> +ratelimiting. Reading gets the current ratelimits.
> +
> +What:		/sys/bus/pci/devices/<dev>/aer/ratelimit_cor_irq
> +Date:		Jan 2025
> +KernelVersion:	6.14.0
> +Contact:	linux-pci@xxxxxxxxxxxxxxx, pandoh@xxxxxxxxxx
> +Description:	IRQ ratelimit for correctable errors.

I would add enough info that we can understand what values to write and what
to read to each description.  This doc didn't lead me to the comment below
and it should have done...

> +
> +What:		/sys/bus/pci/devices/<dev>/aer/ratelimit_uncor_irq
> +Date:		Jan 2025
> +KernelVersion:	6.14.0
> +Contact:	linux-pci@xxxxxxxxxxxxxxx, pandoh@xxxxxxxxxx
> +Description:	IRQ ratelimit for uncorrectable errors.
> +
> +What:		/sys/bus/pci/devices/<dev>/aer/ratelimit_cor_log
> +Date:		Jan 2025
> +KernelVersion:	6.14.0
> +Contact:	linux-pci@xxxxxxxxxxxxxxx, pandoh@xxxxxxxxxx
> +Description:	Log ratelimit for correctable errors.
> +
> +What:		/sys/bus/pci/devices/<dev>/aer/ratelimit_uncor_log
> +Date:		Jan 2025
> +KernelVersion:	6.14.0
> +Contact:	linux-pci@xxxxxxxxxxxxxxx, pandoh@xxxxxxxxxx
> +Description:	Log ratelimit for uncorrectable errors.
> diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
> index d41272504b18..4d5b0638f120 100644
> --- a/Documentation/PCI/pcieaer-howto.rst
> +++ b/Documentation/PCI/pcieaer-howto.rst
> @@ -89,7 +89,9 @@ AER Ratelimits
>  -------------------------
>  
>  Errors, both at log and IRQ level, are ratelimited per device and error type.
> -This prevents spammy devices from stalling execution.
> +This prevents spammy devices from stalling execution. Ratelimits are exposed
> +in the form of sysfs attributes and configurable. See
> +Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
>  
>  AER Statistics / Counters
>  -------------------------
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 7679d75d71e5..41acb6713e2d 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1693,6 +1693,7 @@ const struct attribute_group *pci_dev_attr_groups[] = {
>  	&pcie_dev_attr_group,
>  #ifdef CONFIG_PCIEAER
>  	&aer_stats_attr_group,
> +	&aer_attr_group,
>  #endif
>  #ifdef CONFIG_PCIEASPM
>  	&aspm_ctrl_attr_group,
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 2e40fc63ba31..9d0272a890ef 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -881,6 +881,7 @@ void pci_no_aer(void);
>  void pci_aer_init(struct pci_dev *dev);
>  void pci_aer_exit(struct pci_dev *dev);
>  extern const struct attribute_group aer_stats_attr_group;
> +extern const struct attribute_group aer_attr_group;
>  void pci_aer_clear_fatal_status(struct pci_dev *dev);
>  int pci_aer_clear_status(struct pci_dev *dev);
>  int pci_aer_raw_clear_status(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1db70ae87f52..e48e2951baae 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -630,6 +630,62 @@ const struct attribute_group aer_stats_attr_group = {
>  	.is_visible = aer_stats_attrs_are_visible,
>  };
>  
> +#define aer_ratelimit_attr(name, ratelimit)				\
> +	static ssize_t							\
> +	name##_show(struct device *dev, struct device_attribute *attr,	\
> +		     char *buf)						\
> +{									\
> +	struct pci_dev *pdev = to_pci_dev(dev);				\
> +	return sysfs_emit(buf, "%u errors every %u seconds\n",	\
> +			  pdev->aer_info->ratelimit.burst,		\
> +			  pdev->aer_info->ratelimit.interval / HZ);	\

Usual rule of thumb is you need a very strong reason to read something
different from what was written to a sysfs file.


I think your interval is fixed? If so name the files so that is apparent
and just have a count returned here.  Or if you want to future proof
add a read only ratelimit_period_secs file that prints 5


ratelimit_in_5secs_uncor_log etc.


> +}									\
> +									\
> +	static ssize_t							\
> +	name##_store(struct device *dev, struct device_attribute *attr,	\
> +		     const char *buf, size_t count)				\
> +{									\
> +	struct pci_dev *pdev = to_pci_dev(dev);				\
> +	int burst;							\
> +									\
> +	if (kstrtoint(buf, 0, &burst) < 0)				\
> +		return -EINVAL;						\
> +									\
> +	pdev->aer_info->ratelimit.burst = burst;			\
> +	return count;							\
> +}									\
> +static DEVICE_ATTR_RW(name)





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux