On Tue, Apr 12, 2022 at 10:02 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > On Tue, Apr 5, 2022 at 10:00 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > The CXL specification claims S3 support at a hardware level, but at a > > system software level there are some missing pieces. Section 9.4 (CXL > > 2.0) rightly claims that "CXL mem adapters may need aux power to retain > > memory context across S3", but there is no enumeration mechanism for the > > OS to determine if a given adapter has that support. Moreover the save > > state and resume image for the system may inadvertantly end up in a CXL > > device that needs to be restored before the save state is recoverable. > > I.e. a circular dependency that is not resolvable without a third party > > save-area. > > > > Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly > > allows for suspend, but requires unbinding all CXL memory devices before > > the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind > > flow is intended to also tear down all CXL memory regions associated > > with a given cxl_memdev. > > > > It is reasonable to assume that any device participating in a System RAM > > range published in the EFI memory map is covered by aux power and > > save-area outside the device itself. So this restriction can be > > minimized in the future once pre-existing region enumeration support > > arrives, and perhaps a spec update to clarify if the EFI memory map is > > sufficent for determining the range of devices managed by > > platform-firmware for S3 support. > > > > Per Rafael, if the CXL configuration prevents suspend then it should > > fail early before tasks are frozen, and mem_sleep should stop showing > > 'mem' as an option [1]. Effectively CXL augments the platform suspend > > ->valid() op since, for example, the ACPI ops are not aware of the CXL / > > PCI dependencies. Given the split role of platform firmware vs OS > > provisioned CXL memory it is up to the cxl_mem driver to determine if > > the CXL configuration has elements that platform firmware may not be > > prepared to restore. > > > > Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@xxxxxxxxxxxxxx [1] > > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> > > Cc: Pavel Machek <pavel@xxxxxx> > > Cc: Len Brown <len.brown@xxxxxxxxx> > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > The majority of the changes are in the CXL code, so > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > or please let me know if you want me to take this. Thanks Rafael, I'll do one more update to return the call to device_set_pm_not_required() since with this new scheme the CXL dev_pm_ops are not used. I'll take this through cxl.git. > > Thanks! > > > --- > > drivers/Makefile | 2 +- > > drivers/cxl/Kconfig | 4 ++++ > > drivers/cxl/Makefile | 2 +- > > drivers/cxl/core/Makefile | 1 + > > drivers/cxl/core/memdev.c | 1 - > > drivers/cxl/core/suspend.c | 23 +++++++++++++++++++++++ > > drivers/cxl/cxlmem.h | 11 +++++++++++ > > drivers/cxl/mem.c | 22 +++++++++++++++++++++- > > include/linux/pm.h | 9 +++++++++ > > kernel/power/hibernate.c | 2 +- > > kernel/power/main.c | 5 ++++- > > kernel/power/suspend.c | 3 ++- > > 12 files changed, 78 insertions(+), 7 deletions(-) > > create mode 100644 drivers/cxl/core/suspend.c > > > > diff --git a/drivers/Makefile b/drivers/Makefile > > index 020780b6b4d2..f735c4955143 100644 > > --- a/drivers/Makefile > > +++ b/drivers/Makefile > > @@ -72,9 +72,9 @@ obj-$(CONFIG_PARPORT) += parport/ > > obj-y += base/ block/ misc/ mfd/ nfc/ > > obj-$(CONFIG_LIBNVDIMM) += nvdimm/ > > obj-$(CONFIG_DAX) += dax/ > > -obj-$(CONFIG_CXL_BUS) += cxl/ > > obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/ > > obj-$(CONFIG_NUBUS) += nubus/ > > +obj-y += cxl/ > > obj-y += macintosh/ > > obj-y += scsi/ > > obj-y += nvme/ > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > > index b88ab956bb7c..f64e3984689f 100644 > > --- a/drivers/cxl/Kconfig > > +++ b/drivers/cxl/Kconfig > > @@ -98,4 +98,8 @@ config CXL_PORT > > default CXL_BUS > > tristate > > > > +config CXL_SUSPEND > > + def_bool y > > + depends on SUSPEND && CXL_MEM > > + > > endif > > diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile > > index ce267ef11d93..a78270794150 100644 > > --- a/drivers/cxl/Makefile > > +++ b/drivers/cxl/Makefile > > @@ -1,5 +1,5 @@ > > # SPDX-License-Identifier: GPL-2.0 > > -obj-$(CONFIG_CXL_BUS) += core/ > > +obj-y += core/ > > obj-$(CONFIG_CXL_PCI) += cxl_pci.o > > obj-$(CONFIG_CXL_MEM) += cxl_mem.o > > obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o > > diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile > > index 6d37cd78b151..9d35085d25af 100644 > > --- a/drivers/cxl/core/Makefile > > +++ b/drivers/cxl/core/Makefile > > @@ -1,5 +1,6 @@ > > # SPDX-License-Identifier: GPL-2.0 > > obj-$(CONFIG_CXL_BUS) += cxl_core.o > > +obj-$(CONFIG_CXL_SUSPEND) += suspend.o > > > > ccflags-y += -I$(srctree)/drivers/cxl > > cxl_core-y := port.o > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c > > index 1f76b28f9826..efe4d2e9bfef 100644 > > --- a/drivers/cxl/core/memdev.c > > +++ b/drivers/cxl/core/memdev.c > > @@ -251,7 +251,6 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, > > dev->bus = &cxl_bus_type; > > dev->devt = MKDEV(cxl_mem_major, cxlmd->id); > > dev->type = &cxl_memdev_type; > > - device_set_pm_not_required(dev); > > INIT_WORK(&cxlmd->detach_work, detach_memdev); > > > > cdev = &cxlmd->cdev; > > diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c > > new file mode 100644 > > index 000000000000..88bdbe30a1df > > --- /dev/null > > +++ b/drivers/cxl/core/suspend.c > > @@ -0,0 +1,23 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */ > > +#include <linux/atomic.h> > > +#include <linux/export.h> > > + > > +static atomic_t mem_active; > > + > > +bool cxl_mem_active(void) > > +{ > > + return atomic_read(&mem_active) != 0; > > +} > > + > > +void cxl_mem_active_inc(void) > > +{ > > + atomic_inc(&mem_active); > > +} > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, CXL); > > + > > +void cxl_mem_active_dec(void) > > +{ > > + atomic_dec(&mem_active); > > +} > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, CXL); > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > > index 243dd86a8b46..7235d2f976e5 100644 > > --- a/drivers/cxl/cxlmem.h > > +++ b/drivers/cxl/cxlmem.h > > @@ -353,6 +353,17 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds); > > struct cxl_dev_state *cxl_dev_state_create(struct device *dev); > > void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); > > void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); > > +#ifdef CONFIG_CXL_SUSPEND > > +void cxl_mem_active_inc(void); > > +void cxl_mem_active_dec(void); > > +#else > > +static inline void cxl_mem_active_inc(void) > > +{ > > +} > > +static inline void cxl_mem_active_dec(void) > > +{ > > +} > > +#endif > > > > struct cxl_hdm { > > struct cxl_component_regs regs; > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > > index 49a4b1c47299..0576d2d3df07 100644 > > --- a/drivers/cxl/mem.c > > +++ b/drivers/cxl/mem.c > > @@ -129,6 +129,11 @@ __mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds) > > return do_hdm_init; > > } > > > > +static void enable_suspend(void *data) > > +{ > > + cxl_mem_active_dec(); > > +} > > + > > static int cxl_mem_probe(struct device *dev) > > { > > struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > > @@ -207,7 +212,22 @@ static int cxl_mem_probe(struct device *dev) > > out: > > cxl_device_unlock(&parent_port->dev); > > put_device(&parent_port->dev); > > - return rc; > > + > > + /* > > + * The kernel may be operating out of CXL memory on this device, > > + * there is no spec defined way to determine whether this device > > + * preserves contents over suspend, and there is no simple way > > + * to arrange for the suspend image to avoid CXL memory which > > + * would setup a circular dependency between PCI resume and save > > + * state restoration. > > + * > > + * TODO: support suspend when all the regions this device is > > + * hosting are locked and covered by the system address map, > > + * i.e. platform firmware owns restoring the HDM configuration > > + * that it locked. > > + */ > > + cxl_mem_active_inc(); > > + return devm_add_action_or_reset(dev, enable_suspend, NULL); > > } > > > > static struct cxl_driver cxl_mem_driver = { > > diff --git a/include/linux/pm.h b/include/linux/pm.h > > index e65b3ab28377..7911c4c9a7be 100644 > > --- a/include/linux/pm.h > > +++ b/include/linux/pm.h > > @@ -36,6 +36,15 @@ static inline void pm_vt_switch_unregister(struct device *dev) > > } > > #endif /* CONFIG_VT_CONSOLE_SLEEP */ > > > > +#ifdef CONFIG_CXL_SUSPEND > > +bool cxl_mem_active(void); > > +#else > > +static inline bool cxl_mem_active(void) > > +{ > > + return false; > > +} > > +#endif > > + > > /* > > * Device power management > > */ > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > > index 938d5c78b421..20a66bf9f465 100644 > > --- a/kernel/power/hibernate.c > > +++ b/kernel/power/hibernate.c > > @@ -83,7 +83,7 @@ bool hibernation_available(void) > > { > > return nohibernate == 0 && > > !security_locked_down(LOCKDOWN_HIBERNATION) && > > - !secretmem_active(); > > + !secretmem_active() && !cxl_mem_active(); > > } > > > > /** > > diff --git a/kernel/power/main.c b/kernel/power/main.c > > index 7e646079fbeb..3e6be1c33e0b 100644 > > --- a/kernel/power/main.c > > +++ b/kernel/power/main.c > > @@ -127,7 +127,9 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, > > char *s = buf; > > suspend_state_t i; > > > > - for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) > > + for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) { > > + if (i >= PM_SUSPEND_MEM && cxl_mem_active()) > > + continue; > > if (mem_sleep_states[i]) { > > const char *label = mem_sleep_states[i]; > > > > @@ -136,6 +138,7 @@ static ssize_t mem_sleep_show(struct kobject *kobj, struct kobj_attribute *attr, > > else > > s += sprintf(s, "%s ", label); > > } > > + } > > > > /* Convert the last space to a newline if needed. */ > > if (s != buf) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 6fcdee7e87a5..827075944d28 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -236,7 +236,8 @@ EXPORT_SYMBOL_GPL(suspend_valid_only_mem); > > > > static bool sleep_state_supported(suspend_state_t state) > > { > > - return state == PM_SUSPEND_TO_IDLE || valid_state(state); > > + return state == PM_SUSPEND_TO_IDLE || > > + (valid_state(state) && !cxl_mem_active()); > > } > > > > static int platform_suspend_prepare(suspend_state_t state) > >