On 9/26/23 4:36 PM, Dan Williams wrote: > Ben Cheatham wrote: >> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI >> v6.5. Because these error types target memory-mapped CXL 1.1 compliant >> downstream ports and not physical (normal/persistent) memory, these >> error types are not currently allowed through the memory range >> validation done by the EINJ driver. >> >> The MMIO address of a CXL 1.1 downstream port can be found in the >> cxl_rcrb_addr file in the corresponding dport directory under >> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same >> procedure as a memory error type, but with param1 set to the >> downstream port MMIO address. >> >> Example usage: >> $ cd /sys/kernel/debug/apei/einj >> $ cat available_error_type >> 0x00000008 Memory Correctable >> 0x00000010 Memory Uncorrectable non-fatal >> 0x00000020 Memory Uncorrectable fatal >> 0x00000040 PCI Express Correctable >> 0x00000080 PCI Express Uncorrectable non-fatal >> 0x00000100 PCI Express Uncorrectable fatal >> 0x00008000 CXL.mem Protocol Correctable >> 0x00020000 CXL.mem Protocol Uncorrectable fatal >> $ echo 0x8000 > error_type >> $ echo 0xfffffffffffff000 > param2 >> $ echo 0x2 > flags >> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr >> 0xb2f00000 >> $ echo 0xb2f00000 > param1 >> $ echo 1 > error_inject > > I have the same reaction to this as I did before: > > http://lore.kernel.org/r/647817212bcf1_e067a2945@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch > > Why is per-port error injection being driven from this legacy global > interface where userspace needs to take information from sysfs and walk > it over to this other interface? Especially since "rcrb" is an > implementation detail that will be invalidated with CXL VH topologies? > I get what you're saying, I did it this way because I saw this as primarily an EINJ feature and wanted to keep it in that module. I agree with you though, it's clunky and would be better inside the cxl module/debugfs. > What I would like to see, since this is a new capability with no need to > be beholden to legacy is to disaggregate the interface to be per-port. > > For example: > > /sys/kernel/debug/cxl/$mem/{inject,clear}_poison is already established > for memory device poison injection. Why not add something like: > > /sys/kernel/debug/cxl/$port/einj_{type,inject} > > For triggering errors by the CXL subsystem device name, and unburden > userspace from needing to deal in magic numbers. I'll go ahead and move everything over to the cxl debugfs and do what you're suggesting. Thanks for taking the time to take a look! Thanks, Ben