On Wed, Mar 05, 2025 at 05:32:45PM -0800, Jon Pan-Doh wrote: > > On Tue, Mar 04, 2025 at 05:04:21PM -0800, Jon Pan-Doh wrote: > > > Would a log suffice in that case (i.e. when aer_get_device_error() > > > returns 0)? Something along the lines of "{device} is not accessible > > > while processing (un)correctable error" > > What are your thoughts on this? It adds the pcie port log in the > edge case described (with no loss of info) and doesn't require > changes to current ratelimit logic. Something like this (with more > fields filled in of course): > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 21cdf590b25e..bdfc7e8d6f0f 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1253,6 +1253,8 @@ static inline void > aer_process_err_devices(struct aer_err_info *e_info) > for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { > if (aer_get_device_error_info(e_info->dev[i], e_info)) > aer_print_error(e_info->dev[i], e_info); > + else > + pci_error(e_info->dev[i], "{device} is not > accessible while processing (un)correctable error"); > } > for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { > if (aer_get_device_error_info(e_info->dev[i], e_info)) Maybe, although I think consistency is very important, and we'll always have Root Port info but won't always have Endpoint info. So dropping the Root Port message seems possibly the wrong way around when it's the Endpoint part that's "optional". One thing I do like about the current messages is that they associate information with the device that is the source of the information. I remember finding this very confusing when I first looked at how AER works. E.g., the "pcieport ... Correctable error" message means the Root Port received an ERR_COR and generated an interrupt, and the error class and error source came from the Root Port AER Capability. Similarly, the "e1000e ... error status" message contains information read from the Endpoint AER Capability. I do think the existing messages are WAY too verbose. I would love to make them more concise, and I think the important endpoint info could probably be squeezed into a single line, although obviously TLP header logs would be too much for that. Bjorn