Bjorn Helgaas wrote: > On Fri, Feb 10, 2023 at 10:04:03AM -0700, Dave Jiang wrote: > > By default the CXL RAS mask registers bits are defaulted to 1's and > > suppress all error reporting. If the kernel has negotiated ownership > > of error handling for CXL then unmask the mask registers by writing 0s. > > > > PCI_EXP_AER_FLAGS moved to linux/pci.h header to expose to driver. It > > allows exposure of system enabled PCI error flags for the driver to decide > > which error bits to toggle. Bjorn suggested that the error enabling should > > be controlled from the system policy rather than a driver level choice[1]. > > > > CXL RAS CE and UE masks are checked against PCI_EXP_AER_FLAGS before > > unmasking. > > > > [1]: https://lore.kernel.org/linux-cxl/20230210122952.00006999@xxxxxxxxxx/T/#me8c7f39d43029c64ccff5c950b78a2cee8e885af > > > +static int cxl_pci_ras_unmask(struct pci_dev *pdev) > > +{ > > + struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus); > > + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); > > + void __iomem *addr; > > + u32 orig_val, val, mask; > > + > > + if (!cxlds->regs.ras) > > + return -ENODEV; > > + > > + /* BIOS has CXL error control */ > > + if (!host_bridge->native_cxl_error) > > + return -EOPNOTSUPP; > > + > > + if (PCI_EXP_AER_FLAGS & PCI_EXP_DEVCTL_URRE) { > > 1) I don't really want to expose PCI_EXP_AER_FLAGS in linux/pci.h. > It's basically a convenience part of the AER implementation. > > 2) I think your intent here is to configure the CXL RAS masking based > on what PCIe error reporting is enabled, but doing it by looking at > PCI_EXP_AER_FLAGS doesn't seem right. This expression is a > compile-time constant that is always true, but we can't rely on > devices always being configured that way. Oh, I had asked Dave to do that to try to satisfy your request for a system wide policy. So that if someone wanted to modify what errors get unmasked globally just look at that value rather than re-read the register, but it seems I over-intepreted what you and Jonathan were talking about here when you mention "system policy": https://lore.kernel.org/linux-cxl/20221229172731.GA611562@bhelgaas/ > We call pci_aer_init() for every device during enumeration, but we > only write PCI_EXP_AER_FLAGS if pci_aer_available() and if > pcie_aer_is_native(). And there are a bunch of drivers that call > pci_disable_pcie_error_reporting(), which *clears* those flags. I'm > not sure those drivers *should* be doing that, but they do today. > > I'm not sure why this needs to be conditional at all, but if it does, > maybe you want to read PCI_EXP_DEVCTL and base it on that? Re-reading the register works too. Some of this may want to move to a more core location once/if CXL devices beyond the generic Memory Expander Class Code start arriving, but that can be saved for later.