On Thu, 27 Feb 2025 22:38:14 +0000 <shiju.jose@xxxxxxxxxx> wrote: > From: Shiju Jose <shiju.jose@xxxxxxxxxx> > > Post Package Repair (PPR) maintenance operations may be supported by CXL > devices that implement CXL.mem protocol. A PPR maintenance operation > requests the CXL device to perform a repair operation on its media. > For example, a CXL device with DRAM components that support PPR features > may implement PPR Maintenance operations. DRAM components may support two > types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR > (sPPR), for a temporary row repair. Soft PPR is much faster than hPPR, > but the repair is lost with a power cycle. > > During the execution of a PPR Maintenance operation, a CXL memory device: > - May or may not retain data > - May or may not be able to process CXL.mem requests correctly, including > the ones that target the DPA involved in the repair. > These CXL Memory Device capabilities are specified by Restriction Flags > in the sPPR Feature and hPPR Feature. > > Soft PPR maintenance operation may be executed at runtime, if data is > retained and CXL.mem requests are correctly processed. For CXL devices with > DRAM components, hPPR maintenance operation may be executed only at boot > because typically data may not be retained with hPPR maintenance operation. > > When a CXL device identifies error on a memory component, the device > may inform the host about the need for a PPR maintenance operation by using > an Event Record, where the Maintenance Needed flag is set. The Event Record > specifies the DPA that should be repaired. A CXL device may not keep track > of the requests that have already been sent and the information on which > DPA should be repaired may be lost upon power cycle. > The userspace tool requests for maintenance operation if the number of > corrected error reported on a CXL.mem media exceeds error threshold. > > CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR) > maintenance operation and section 8.2.10.7.1.3 describes the device's > hPPR (hard PPR) maintenance operation feature. > > CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and > configuration. > > CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and > configuration. > > Add support for controlling CXL memory device soft PPR (sPPR) feature. > Register with EDAC driver, which gets the memory repair attr descriptors > from the EDAC memory repair driver and exposes sysfs repair control > attributes for PRR to the userspace. For example CXL PPR control for the > CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/ > > Add checks to ensure the memory to be repaired is offline and originates > from a CXL DRAM or CXL gen_media error record reported in the current boot, > before requesting a PPR operation on the device. > > Tested with QEMU patch for CXL PPR feature. > https://lore.kernel.org/all/20240730045722.71482-1-dave@xxxxxxxxxxxx/ > > Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx> > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>