On Fri, Feb 10, 2023 at 03:38:53PM -0800, Dan Williams wrote: > Bjorn Helgaas wrote: > > On Fri, Feb 10, 2023 at 10:04:03AM -0700, Dave Jiang wrote: > > > By default the CXL RAS mask registers bits are defaulted to 1's and > > > suppress all error reporting. If the kernel has negotiated ownership > > > of error handling for CXL then unmask the mask registers by writing 0s. > > > > > > PCI_EXP_AER_FLAGS moved to linux/pci.h header to expose to driver. It > > > allows exposure of system enabled PCI error flags for the driver to decide > > > which error bits to toggle. Bjorn suggested that the error enabling should > > > be controlled from the system policy rather than a driver level choice[1]. > > > > > > CXL RAS CE and UE masks are checked against PCI_EXP_AER_FLAGS before > > > unmasking. > > > > > > [1]: https://lore.kernel.org/linux-cxl/20230210122952.00006999@xxxxxxxxxx/T/#me8c7f39d43029c64ccff5c950b78a2cee8e885af > > > > > +static int cxl_pci_ras_unmask(struct pci_dev *pdev) > > > +{ > > > + struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus); > > > + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); > > > + void __iomem *addr; > > > + u32 orig_val, val, mask; > > > + > > > + if (!cxlds->regs.ras) > > > + return -ENODEV; > > > + > > > + /* BIOS has CXL error control */ > > > + if (!host_bridge->native_cxl_error) > > > + return -EOPNOTSUPP; > > > + > > > + if (PCI_EXP_AER_FLAGS & PCI_EXP_DEVCTL_URRE) { > > > > 1) I don't really want to expose PCI_EXP_AER_FLAGS in linux/pci.h. > > It's basically a convenience part of the AER implementation. > > > > 2) I think your intent here is to configure the CXL RAS masking based > > on what PCIe error reporting is enabled, but doing it by looking at > > PCI_EXP_AER_FLAGS doesn't seem right. This expression is a > > compile-time constant that is always true, but we can't rely on > > devices always being configured that way. > > Oh, I had asked Dave to do that to try to satisfy your request for a > system wide policy. So that if someone wanted to modify what errors get > unmasked globally just look at that value rather than re-read the > register, but it seems I over-intepreted what you and Jonathan were > talking about here when you mention "system policy": > > https://lore.kernel.org/linux-cxl/20221229172731.GA611562@bhelgaas/ Yeah, I should have worded that more like "I think the PCI core, not individual drivers, should be responsible for AER configuration." Even if the PCI core does it all, that doesn't mean AER configuration is known at compile-time because it depends on _OSC negotiation and some kernel parameter special cases. Bjorn