Jonathan Cameron wrote: > On Thu, 23 Jun 2022 21:19:38 -0700 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > From: Ben Widawsky <bwidawsk@xxxxxxxxxx> > > > > CXL 2.0 allows for dynamic provisioning of new memory regions (system > > physical address resources like "System RAM" and "Persistent Memory"). > > Whereas DDR and PMEM resources are conveyed statically at boot, CXL > > allows for assembling and instantiating new regions from the available > > capacity of CXL memory expanders in the system. > > > > Sysfs with an "echo $region_name > $create_region_attribute" interface > > is chosen as the mechanism to initiate the provisioning process. This > > was chosen over ioctl() and netlink() to keep the configuration > > interface entirely in a pseudo-fs interface, and it was chosen over > > configfs since, aside from this one creation event, the interface is > > read-mostly. I.e. configfs supports cases where an object is designed to > > be provisioned each boot, like an iSCSI storage target, and CXL region > > creation is mostly for PMEM regions which are created usually once > > per-lifetime of a server instance. > > > > Recall that the major change that CXL brings over previous > > persistent memory architectures is the ability to dynamically define new > > regions. Compare that to drivers like 'nfit' where the region > > configuration is statically defined by platform firmware. > > > > Regions are created as a child of a root decoder that encompasses an > > address space with constraints. When created through sysfs, the root > > decoder is explicit. When created from an LSA's region structure a root > > decoder will possibly need to be inferred by the driver. > > > > Upon region creation through sysfs, a vacant region is created with a > > unique name. Regions have a number of attributes that must be configured > > before the region can be bound to the driver where HDM decoder program > > is completed. > > > > An example of creating a new region: > > > > - Allocate a new region name: > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) > > > > - Create a new region by name: > > while > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) > > Perhaps it is worth calling out the region ID allocator is shared > with nvdimms and other usecases. I'm not really sure what the advantage > in doing that is, but it doesn't do any real harm. The rationale is that there are several producers of memory regions nvdimm, device-dax (hmem), and now cxl. Of those cases cxl can pass regoins to nvdimm and nvdimm can pass regions to device-dax (pmem). If each of those cases allocated their own region-id it would just complicate debug for no benefit. I can add this a note to remind why memregion_alloc() was introduced in the first instance. > > > ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region > > do true; done I recall you also asked to clarify the rationale of this complexity. It is related to the potential proliferation of disaparate region ids, but also a lesson learned from nvdimm which itself learned lessons from md-raid. The lesson from md-raid in short is do not use ioctl for object creation. After "not ioctl" the choice is configfs or a small bit of sysfs hackery. Configfs is overkill when there is already a sysfs hierarchy that just needs one new object injected. Namespace creation in nvdimm pre-created "seed" devices which let the kernel control the naming, but confused end users that wondered about vestigial devices. This "read to learn next object name" + "write to atomically claim and instantiate that id" cleans up that vestigial device problem while also constraining object naming to follow memregion id expectations. > > > > - Region now exists in sysfs: > > stat -t /sys/bus/cxl/devices/decoder0.0/$region > > > > - Delete the region, and name: > > echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region > > > > Signed-off-by: Ben Widawsky <bwidawsk@xxxxxxxxxx> > > [djbw: simplify locking, reword changelog] > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > > --- > > Documentation/ABI/testing/sysfs-bus-cxl | 25 +++ > > .../driver-api/cxl/memory-devices.rst | 11 + > > drivers/cxl/Kconfig | 5 + > > drivers/cxl/core/Makefile | 1 + > > drivers/cxl/core/core.h | 12 ++ > > drivers/cxl/core/port.c | 39 +++- > > drivers/cxl/core/region.c | 199 ++++++++++++++++++ > > drivers/cxl/cxl.h | 18 ++ > > tools/testing/cxl/Kbuild | 1 + > > 9 files changed, 308 insertions(+), 3 deletions(-) > > create mode 100644 drivers/cxl/core/region.c > > > > ... > > > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > > index 472ec9cb1018..ebe6197fb9b8 100644 > > --- a/drivers/cxl/core/core.h > > +++ b/drivers/cxl/core/core.h > > @@ -9,6 +9,18 @@ extern const struct device_type cxl_nvdimm_type; > > > > extern struct attribute_group cxl_base_attribute_group; > > > > +#ifdef CONFIG_CXL_REGION > > +extern struct device_attribute dev_attr_create_pmem_region; > > +extern struct device_attribute dev_attr_delete_region; > > +/* > > + * Note must be used at the end of an attribute list, since it > > + * terminates the list in the CONFIG_CXL_REGION=n case. > > That's rather ugly. Maybe just push the ifdef down into the c file > where we will be shortening the list and it should be obvious what is > going on without needing the comment? Much as I don't like ifdef > magic in the c files, it sometimes ends up cleaner. No, I think ifdef in C is definitely uglier, but I also notice that helpers like SET_SYSTEM_SLEEP_PM_OPS() are defined to be used in any place in the list. So, I'll just duplicate that approach. > > > + */ > > +#define CXL_REGION_ATTR(x) (&dev_attr_##x.attr) > > +#else > > +#define CXL_REGION_ATTR(x) NULL > > +#endif > > + > > struct cxl_send_command; > > struct cxl_mem_query_commands; > > int cxl_query_cmd(struct cxl_memdev *cxlmd, > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > > index 2e56903399c2..c9207ebc3f32 100644 > > --- a/drivers/cxl/core/port.c > > +++ b/drivers/cxl/core/port.c > > @@ -1,6 +1,7 @@ > > // SPDX-License-Identifier: GPL-2.0-only > > /* Copyright(c) 2020 Intel Corporation. All rights reserved. */ > > #include <linux/io-64-nonatomic-lo-hi.h> > > +#include <linux/memregion.h> > > #include <linux/workqueue.h> > > #include <linux/debugfs.h> > > #include <linux/device.h> > > @@ -300,11 +301,35 @@ static struct attribute *cxl_decoder_root_attrs[] = { > > &dev_attr_cap_type2.attr, > > &dev_attr_cap_type3.attr, > > &dev_attr_target_list.attr, > > + CXL_REGION_ATTR(create_pmem_region), > > + CXL_REGION_ATTR(delete_region), > > NULL, > > }; > > > > > static const struct attribute_group *cxl_decoder_root_attribute_groups[] = { > > @@ -387,6 +412,7 @@ static void cxl_root_decoder_release(struct device *dev) > > { > > struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); > > > > + memregion_free(atomic_read(&cxlrd->region_id)); > > __cxl_decoder_release(&cxlrd->cxlsd.cxld); > > kfree(cxlrd); > > } > > @@ -1415,6 +1441,7 @@ static struct lock_class_key cxl_decoder_key; > > static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port, > > unsigned int nr_targets) > > { > > + struct cxl_root_decoder *cxlrd = NULL; > > struct cxl_decoder *cxld; > > struct device *dev; > > void *alloc; > > @@ -1425,16 +1452,20 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port, > > > > if (nr_targets) { > > struct cxl_switch_decoder *cxlsd; > > - struct cxl_root_decoder *cxlrd; > > > > if (is_cxl_root(port)) { > > alloc = kzalloc(struct_size(cxlrd, cxlsd.target, > > nr_targets), > > GFP_KERNEL); > > cxlrd = alloc; > > - if (cxlrd) > > + if (cxlrd) { > > cxlsd = &cxlrd->cxlsd; > > - else > > + atomic_set(&cxlrd->region_id, -1); > > + rc = memregion_alloc(GFP_KERNEL); > > + if (rc < 0) > > + goto err; > > Leaving region_id set to -1 seems interesting for ever > recovering from this error. Perhaps a comment on how the magic > value is used. Comment added. > > > + atomic_set(&cxlrd->region_id, rc); > > + } else > > cxlsd = NULL; > > } else { > > alloc = kzalloc(struct_size(cxlsd, target, nr_targets), > > @@ -1490,6 +1521,8 @@ static struct cxl_decoder *cxl_decoder_alloc(struct cxl_port *port, > > > > return cxld; > > err: > > + if (cxlrd && atomic_read(&cxlrd->region_id) >= 0) > > + memregion_free(atomic_read(&cxlrd->region_id)); > > kfree(alloc); > > return ERR_PTR(rc); > > } > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > > new file mode 100644 > > index 000000000000..f2a0ead20ca7 > > --- /dev/null > > +++ b/drivers/cxl/core/region.c > > @@ -0,0 +1,199 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */ > > +#include <linux/memregion.h> > > +#include <linux/genalloc.h> > > +#include <linux/device.h> > > +#include <linux/module.h> > > +#include <linux/slab.h> > > +#include <linux/idr.h> > > +#include <cxl.h> > > +#include "core.h" > > + > > +/** > > + * DOC: cxl core region > > + * > > + * CXL Regions represent mapped memory capacity in system physical address > > + * space. Whereas the CXL Root Decoders identify the bounds of potential CXL > > + * Memory ranges, Regions represent the active mapped capacity by the HDM > > + * Decoder Capability structures throughout the Host Bridges, Switches, and > > + * Endpoints in the topology. > > + */ > > + > > +static struct cxl_region *to_cxl_region(struct device *dev); > > + > > +static void cxl_region_release(struct device *dev) > > +{ > > + struct cxl_region *cxlr = to_cxl_region(dev); > > + > > + memregion_free(cxlr->id); > > + kfree(cxlr); > > +} > > + > > +static const struct device_type cxl_region_type = { > > + .name = "cxl_region", > > + .release = cxl_region_release, > > +}; > > + > > +bool is_cxl_region(struct device *dev) > > +{ > > + return dev->type == &cxl_region_type; > > +} > > +EXPORT_SYMBOL_NS_GPL(is_cxl_region, CXL); > > + > > +static struct cxl_region *to_cxl_region(struct device *dev) > > +{ > > + if (dev_WARN_ONCE(dev, dev->type != &cxl_region_type, > > + "not a cxl_region device\n")) > > + return NULL; > > + > > + return container_of(dev, struct cxl_region, dev); > > +} > > + > > +static void unregister_region(void *dev) > > +{ > > + device_unregister(dev); > > +} > > + > > +static struct lock_class_key cxl_region_key; > > + > > +static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int id) > > +{ > > + struct cxl_region *cxlr; > > + struct device *dev; > > + > > + cxlr = kzalloc(sizeof(*cxlr), GFP_KERNEL); > > + if (!cxlr) { > > + memregion_free(id); > > That's a bit nasty as it gives the function side effects. Perhaps some > comments in the callers of this to highlight that memregion will either be freed > in here or handled over to the device. Added to the devm_cxl_add_region() kdoc that the memregion id is freed on failure. > > > + return ERR_PTR(-ENOMEM); > > + } > > + > > + dev = &cxlr->dev; > > + device_initialize(dev); > > + lockdep_set_class(&dev->mutex, &cxl_region_key); > > + dev->parent = &cxlrd->cxlsd.cxld.dev; > > + device_set_pm_not_required(dev); > > + dev->bus = &cxl_bus_type; > > + dev->type = &cxl_region_type; > > + cxlr->id = id; > > + > > + return cxlr; > > +} > > + > > +/** > > + * devm_cxl_add_region - Adds a region to a decoder > > + * @cxlrd: root decoder > > + * @id: memregion id to create > > + * @mode: mode for the endpoint decoders of this region > > Missing docs for type Added. > > > + * > > + * This is the second step of region initialization. Regions exist within an > > + * address space which is mapped by a @cxlrd. > > + * > > + * Return: 0 if the region was added to the @cxlrd, else returns negative error > > + * code. The region will be named "regionZ" where Z is the unique region number. > > + */ > > +static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, > > + int id, > > + enum cxl_decoder_mode mode, > > + enum cxl_decoder_type type) > > +{ > > + struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent); > > + struct cxl_region *cxlr; > > + struct device *dev; > > + int rc; > > + > > + cxlr = cxl_region_alloc(cxlrd, id); > > + if (IS_ERR(cxlr)) > > + return cxlr; > > + cxlr->mode = mode; > > + cxlr->type = type; > > + > > + dev = &cxlr->dev; > > + rc = dev_set_name(dev, "region%d", id); > > + if (rc) > > + goto err; > > + > > + rc = device_add(dev); > > + if (rc) > > + goto err; > > + > > + rc = devm_add_action_or_reset(port->uport, unregister_region, cxlr); > > + if (rc) > > + return ERR_PTR(rc); > > + > > + dev_dbg(port->uport, "%s: created %s\n", > > + dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev)); > > + return cxlr; > > + > > +err: > > + put_device(dev); > > + return ERR_PTR(rc); > > +} > > + > > > +static ssize_t create_pmem_region_store(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t len) > > +{ > > + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); > > + struct cxl_region *cxlr; > > + unsigned int id, rc; > > + > > + rc = sscanf(buf, "region%u\n", &id); > > + if (rc != 1) > > + return -EINVAL; > > + > > + rc = memregion_alloc(GFP_KERNEL); > > + if (rc < 0) > > + return rc; > > + > > + if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) { > > + memregion_free(rc); > > + return -EBUSY; > > + } > > + > > + cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM, > > + CXL_DECODER_EXPANDER); > > + if (IS_ERR(cxlr)) > > + return PTR_ERR(cxlr); > > + > > + return len; > > +} > > +DEVICE_ATTR_RW(create_pmem_region); > > + > > +static struct cxl_region *cxl_find_region_by_name(struct cxl_decoder *cxld, > > Perhaps rename cxld here to make it clear it's a root decoder only. Yes, in fact it should just be a 'struct cxl_root_decoder' type argument. > > > + const char *name) > > +{ > > + struct device *region_dev; > > + > > + region_dev = device_find_child_by_name(&cxld->dev, name); > > + if (!region_dev) > > + return ERR_PTR(-ENODEV); > > + > > + return to_cxl_region(region_dev); > > +} > > + > > +static ssize_t delete_region_store(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t len) > > +{ > > + struct cxl_port *port = to_cxl_port(dev->parent); > > + struct cxl_decoder *cxld = to_cxl_decoder(dev); > As above, given it's the root decoder can we name it to make that > obvious? Right, this is now: struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);