On Tue, Jan 30, 2024 at 06:26:39PM -0800, Kuppuswamy Sathyanarayanan wrote: > > On 1/24/24 10:27 PM, Wang, Qingshun wrote: > > When Advisory Non-Fatal errors are raised, both correctable and > > Maybe you can start with same info about what Advisory Non-FataL > errors are and the specification reference. I know that you included > it in cover letter. But it is good to include it in commit log. Good idea, thanks! > > > uncorrectable error statuses will be set. The current kernel code cannot > > store both statuses at the same time, thus failing to handle ANFE properly. > > In addition, to avoid clearing UEs that are not ANFE by accident, UE > > Please add some details about the impact of not clearing them. Makes sense, will do. > > severity and Device Status also need to be recorded: any fatal UE cannot > > be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do > > not take any assumption and let UE handler to clear UE status. > > > > Store status and mask of both correctable and uncorrectable errors in > > aer_err_info. The severity of UEs and the values of the Device Status > > register are also recorded, which will be used to determine UEs that should > > be handled by the ANFE handler. Refactor the rest of the code to use > > cor/uncor_status and cor/uncor_mask fields instead of status and mask > > fields. > > > > Signed-off-by: "Wang, Qingshun" <qingshun.wang@xxxxxxxxxxxxxxx> > > --- > > drivers/acpi/apei/ghes.c | 10 ++++- > > drivers/cxl/core/pci.c | 6 ++- > > drivers/pci/pci.h | 8 +++- > > drivers/pci/pcie/aer.c | 93 ++++++++++++++++++++++++++-------------- > > include/linux/aer.h | 4 +- > > include/linux/pci.h | 27 ++++++++++++ > > 6 files changed, 111 insertions(+), 37 deletions(-) > > > > ...... > > > > @@ -1213,38 +1233,49 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) > > int temp; > > > > /* Must reset in this function */ > > - info->status = 0; > > + info->cor_status = 0; > > + info->uncor_status = 0; > > + info->uncor_severity = 0; > > info->tlp_header_valid = 0; > > > > /* The device might not support AER */ > > if (!aer) > > return 0; > > > > - if (info->severity == AER_CORRECTABLE) { > > + if (info->severity == AER_CORRECTABLE || > > + info->severity == AER_NONFATAL || > > + type == PCI_EXP_TYPE_ROOT_PORT || > > + type == PCI_EXP_TYPE_RC_EC || > > + type == PCI_EXP_TYPE_DOWNSTREAM) { > > > It looks like you are reading both uncorrectable and correctable status > by default for both NONFATAL and CORRECTABLE errors. Why not do > it conditionally only for ANFE errors? > > My initial purpose was the value will be used in aer_event trace in PATCH 4 under both conditions, but I can also add checks here to reduce unnecessary IO and remove the checks in PATCH 4. > > + /* Link is healthy for IO reads */ > > pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, > > - &info->status); > > + &info->cor_status); > > pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, > > - &info->mask); > > > > ...... > > > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index add9368e6314..259812620d4d 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -318,6 +318,33 @@ struct pci_sriov; > > struct pci_p2pdma; > > struct rcec_ea; > > > > +struct pcie_capability_regs { > > + u8 pcie_cap_id; > > + u8 next_cap_ptr; > > + u16 pcie_caps; > > + u32 device_caps; > > + u16 device_control; > > + u16 device_status; > > + u32 link_caps; > > + u16 link_control; > > + u16 link_status; > > + u32 slot_caps; > > + u16 slot_control; > > + u16 slot_status; > > + u16 root_control; > > + u16 root_caps; > > + u32 root_status; > > + u32 device_caps_2; > > + u16 device_control_2; > > + u16 device_status_2; > > + u32 link_caps_2; > > + u16 link_control_2; > > + u16 link_status_2; > > + u32 slot_caps_2; > > + u16 slot_control_2; > > + u16 slot_status_2; > > +}; > > + > IIUC, this struct is only used drivers/acpi/apei/ghes.c . Why not define it in that file? You are right. Whenever we need it elsewhere, we can move it to the header file anyway. > > /* The pci_dev structure describes PCI devices */ > > struct pci_dev { > > struct list_head bus_list; /* Node in per-bus list */ > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer > -- Best regards, Wang, Qingshun