On Mon, Apr 4, 2022 at 9:00 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > On Sun, Apr 3, 2022 at 2:58 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > The CXL specification claims S3 support at a hardware level, but at a > > system software level there are some missing pieces. Section 9.4 rightly > > claims that "CXL mem adapters may need aux power to retain memory > > context across S3", but there is no enumeration mechanism for the OS to > > determine if a given adapter has that support. Moreover the save state > > and resume image for the system may inadvertantly end up in a CXL device > > that needs to be restored before the save state is recoverable. I.e. a > > circular dependency that is not resolvable without a third party > > save-area. > > > > Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly > > allows for suspend, but requires unbinding all CXL memory devices before > > the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind > > flow is intended to also tear down all CXL memory regions associated > > with a given cxl_memdev. > > > > It is reasonable to assume that any device participating in a System RAM > > range published in the EFI memory map is covered by aux power and > > save-area outside the device itself. So this restriction can be > > minimized in the future once pre-existing region enumeration support > > arrives, and perhaps a spec update to clarify if the EFI memory is > > sufficent for determining the range of devices managed by > > platform-firmware for S3 support. > > > > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > A few thoughts: > > 1. I don't think it is necessary to fail suspend-to-idle too (which > the driver will do after the patch AFAICS). Ah true, I missed that this would also disable suspend to idle. > 2. Should hibernation fail too? From the description above it looks > like that should be the case. Yes, any CXL address range that was provisioned by the OS would need some off-device save area for the device-state which seems difficult to support in the general case. > 3. If "deep"suspend is going to fail every time, it may be better to > prevent "deep" from being written to /sys/power/mem_sleep instead of > failing suspend in progress, especially after freezing user space. Yeah, that sounds much better, let me explore that option. > > > --- > > drivers/cxl/core/memdev.c | 1 - > > drivers/cxl/mem.c | 26 ++++++++++++++++++++++++++ > > 2 files changed, 26 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c > > index 1f76b28f9826..efe4d2e9bfef 100644 > > --- a/drivers/cxl/core/memdev.c > > +++ b/drivers/cxl/core/memdev.c > > @@ -251,7 +251,6 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, > > dev->bus = &cxl_bus_type; > > dev->devt = MKDEV(cxl_mem_major, cxlmd->id); > > dev->type = &cxl_memdev_type; > > - device_set_pm_not_required(dev); > > INIT_WORK(&cxlmd->detach_work, detach_memdev); > > > > cdev = &cxlmd->cdev; > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > > index 49a4b1c47299..0660bb1488cb 100644 > > --- a/drivers/cxl/mem.c > > +++ b/drivers/cxl/mem.c > > @@ -3,6 +3,7 @@ > > #include <linux/device.h> > > #include <linux/module.h> > > #include <linux/pci.h> > > +#include <linux/pm.h> > > > > #include "cxlmem.h" > > #include "cxlpci.h" > > @@ -210,10 +211,35 @@ static int cxl_mem_probe(struct device *dev) > > return rc; > > } > > > > +static int cxl_mem_suspend(struct device *dev) > > +{ > > + /* > > + * The kernel may be operating out of CXL memory on this device, > > + * there is no spec defined way to determine whether this device > > + * preserves contents over suspend, and there is no simple way > > + * to arrange for the suspend image to avoid CXL memory which > > + * would setup a circular dependency between PCI resume and save > > + * state restoration. > > + */ > > + dev_err(dev, "CXL memory suspend not supported\n"); > > + return -EBUSY; > > +} > > + > > +static int cxl_mem_resume(struct device *dev) > > +{ > > + /* nothing to do since suspend is prevented */ > > + return 0; > > +} > > This is not needed AFAICS. Ok, I should have checked, but I'll circle back with sleep state disabling rather than failing suspend. Thanks, Rafael.