Jonathan Cameron wrote: > On Thu, 23 Jun 2022 19:47:18 -0700 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > The region provisioning flow will roughly follow a sequence of: > > > > 1/ Allocate DPA to a set of decoders > > > > 2/ Allocate HPA to a region > > > > 3/ Associate decoders with a region and validate that the DPA allocations > > and topologies match the parameters of the region. > > > > For now, this change (step 1) arranges for DPA capacity to be allocated > > and deleted from non-committed decoders based on the decoder's mode / > > partition selection. Capacity is allocated from the lowest DPA in the > > partition and any 'pmem' allocation blocks out all remaining ram > > capacity in its 'skip' setting. DPA allocations are enforced in decoder > > instance order. I.e. decoder N + 1 always starts at a higher DPA than > > instance N, and deleting allocations must proceed from the > > highest-instance allocated decoder to the lowest. > > > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > The error value setting in here might save a few lines, but to me it > is less readable than setting rc in each error path. > > > --- > > Documentation/ABI/testing/sysfs-bus-cxl | 37 +++++++ > > drivers/cxl/core/core.h | 7 + > > drivers/cxl/core/hdm.c | 160 +++++++++++++++++++++++++++++++ > > drivers/cxl/core/port.c | 73 ++++++++++++++ > > 4 files changed, 275 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl > > index 091459216e11..85844f9bc00b 100644 > > --- a/Documentation/ABI/testing/sysfs-bus-cxl > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl > > @@ -171,7 +171,7 @@ Date: May, 2022 > > KernelVersion: v5.20 > > Contact: linux-cxl@xxxxxxxxxxxxxxx > > Description: > > - (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > > + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > > translates from a host physical address range, to a device local > > address range. Device-local address ranges are further split > > into a 'ram' (volatile memory) range and 'pmem' (persistent > > @@ -180,3 +180,38 @@ Description: > > when a decoder straddles the volatile/persistent partition > > boundary, and 'none' indicates the decoder is not actively > > decoding, or no DPA allocation policy has been set. > > + > > + 'mode' can be written, when the decoder is in the 'disabled' > > + state, with either 'ram' or 'pmem' to set the boundaries for the > > + next allocation. > > + > > As before, documentation above this in the file only uses single line break between > entries. Yeah, I'll go fix those up as a precursor patch. > > > + > > +What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource > > +Date: May, 2022 > > +KernelVersion: v5.20 > > +Contact: linux-cxl@xxxxxxxxxxxxxxx > > +Description: > > + (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint", > > + and its 'dpa_size' attribute is non-zero, this attribute > > + indicates the device physical address (DPA) base address of the > > + allocation. > > Why _resource rather than _base or _start? To mimic PCI and NVDIMM sysfs that calls it a 'resource' address. > > > + > > + > > +What: /sys/bus/cxl/devices/decoderX.Y/dpa_size > > +Date: May, 2022 > > +KernelVersion: v5.20 > > +Contact: linux-cxl@xxxxxxxxxxxxxxx > > +Description: > > + (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it > > + translates from a host physical address range, to a device local > > + address range. The range, base address plus length in bytes, of > > + DPA allocated to this decoder is conveyed in these 2 attributes. > > + Allocations can be mutated as long as the decoder is in the > > + disabled state. A write to 'size' releases the previous DPA > > 'dpa_size' ? Yes. > > > + allocation and then attempts to allocate from the free capacity > > + in the device partition referred to by 'decoderX.Y/mode'. > > + Allocate and free requests can only be performed on the highest > > + instance number disabled decoder with non-zero size. I.e. > > + allocations are enforced to occur in increasing 'decoderX.Y/id' > > + order and frees are enforced to occur in decreasing > > + 'decoderX.Y/id' order. > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > > index 1a50c0fc399c..47cf0c286fc3 100644 > > --- a/drivers/cxl/core/core.h > > +++ b/drivers/cxl/core/core.h > > @@ -17,6 +17,13 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s); > > void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > > resource_size_t length); > > > > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, > > + enum cxl_decoder_mode mode); > > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); > > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); > > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); > > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled); > > + > > int cxl_memdev_init(void); > > void cxl_memdev_exit(void); > > void cxl_mbox_init(void); > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > > index 8805afe63ebf..ceb4c28abc1b 100644 > > --- a/drivers/cxl/core/hdm.c > > +++ b/drivers/cxl/core/hdm.c > > @@ -248,6 +248,166 @@ static int cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, > > return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); > > } > > > > +resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) > > +{ > > + resource_size_t size = 0; > > + > > + down_read(&cxl_dpa_rwsem); > > + if (cxled->dpa_res) > > + size = resource_size(cxled->dpa_res); > > + up_read(&cxl_dpa_rwsem); > > + > > + return size; > > +} > > + > > +resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) > > Instinct would be to expect this to return the resource, not the start. > Rename? I can add _start, but this is only servicing the dpa_resource_show() sysfs operation, and there 'resource' as a the base address has ABI history. > > > > +{ > > + resource_size_t base = -1; > > + > > + down_read(&cxl_dpa_rwsem); > > + if (cxled->dpa_res) > > + base = cxled->dpa_res->start; > > + up_read(&cxl_dpa_rwsem); > > + > > + return base; > > +} > > + > > +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) > > +{ > > + int rc = -EBUSY; > > + struct device *dev = &cxled->cxld.dev; > > + struct cxl_port *port = to_cxl_port(dev->parent); > > + > > + down_write(&cxl_dpa_rwsem); > > + if (!cxled->dpa_res) { > > + rc = 0; > > + goto out; > > + } > > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { > > + dev_dbg(dev, "decoder enabled\n"); > > I'd prefer explicit setting of rc = -EBUSY in the two > 'error' paths to make it really clear when looking at these > that they are treated as errors. Ok. > > > + goto out; > > + } > > + if (cxled->cxld.id != port->dpa_end) { > > + dev_dbg(dev, "expected decoder%d.%d\n", port->id, > > + port->dpa_end); > > + goto out; > > + } > > + __cxl_dpa_release(cxled, true); > > + rc = 0; > > +out: > > + up_write(&cxl_dpa_rwsem); > > + return rc; > > +} > > + > > +int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, > > + enum cxl_decoder_mode mode) > > +{ > > + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > + struct cxl_dev_state *cxlds = cxlmd->cxlds; > > + struct device *dev = &cxled->cxld.dev; > > + int rc = -EBUSY; > > As above, I'd prefer seeing error set in each error path rther > than it being set in a few locations and having to go look > for which value it currently has. To me having the > error code next to the condition is much easier to follow. Fair enough. > > > + > > + switch (mode) { > > + case CXL_DECODER_RAM: > > + case CXL_DECODER_PMEM: > > + break; > > + default: > > + dev_dbg(dev, "unsupported mode: %d\n", mode); > > + return -EINVAL; > > + } > > + > > + down_write(&cxl_dpa_rwsem); > > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) > > + goto out; > > + /* > > + * Only allow modes that are supported by the current partition > > + * configuration > > + */ > > + rc = -ENXIO; > > + if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { > > + dev_dbg(dev, "no available pmem capacity\n"); > > + goto out; > > + } > > + if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { > > + dev_dbg(dev, "no available ram capacity\n"); > > + goto out; > > + } > > + > > + cxled->mode = mode; > > + rc = 0; > > +out: > > + up_write(&cxl_dpa_rwsem); > > + > > + return rc; > > +} > > + > > +int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) > > +{ > > + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > + resource_size_t free_ram_start, free_pmem_start; > > + struct cxl_port *port = cxled_to_port(cxled); > > + struct cxl_dev_state *cxlds = cxlmd->cxlds; > > + struct device *dev = &cxled->cxld.dev; > > + resource_size_t start, avail, skip; > > + struct resource *p, *last; > > + int rc = -EBUSY; > > + > > + down_write(&cxl_dpa_rwsem); > > + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { > > + dev_dbg(dev, "decoder enabled\n"); > > + goto out; > > > -EBUSY only used in this path, so clearer to me to push that setting down > to in this error path. Ok. Folded in these changes: diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 85844f9bc00b..3fa6da73751e 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -207,7 +207,7 @@ Description: address range. The range, base address plus length in bytes, of DPA allocated to this decoder is conveyed in these 2 attributes. Allocations can be mutated as long as the decoder is in the - disabled state. A write to 'size' releases the previous DPA + disabled state. A write to 'dpa_size' releases the previous DPA allocation and then attempts to allocate from the free capacity in the device partition referred to by 'decoderX.Y/mode'. Allocate and free requests can only be performed on the highest diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 47cf0c286fc3..65bcaecec405 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -22,7 +22,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); -resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled); +resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); int cxl_memdev_init(void); void cxl_memdev_exit(void); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 0c1ff3c0142f..e9281557781d 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -270,7 +270,7 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) return size; } -resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) +resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled) { resource_size_t base = -1; @@ -284,9 +284,9 @@ resource_size_t cxl_dpa_resource(struct cxl_endpoint_decoder *cxled) int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) { - int rc = -EBUSY; + struct cxl_port *port = cxled_to_port(cxled); struct device *dev = &cxled->cxld.dev; - struct cxl_port *port = to_cxl_port(dev->parent); + int rc; down_write(&cxl_dpa_rwsem); if (!cxled->dpa_res) { @@ -295,11 +295,13 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) } if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { dev_dbg(dev, "decoder enabled\n"); + rc = -EBUSY; goto out; } - if (cxled->cxld.id != port->dpa_end) { + if (cxled->cxld.id != port->hdm_end) { dev_dbg(dev, "expected decoder%d.%d\n", port->id, - port->dpa_end); + port->hdm_end); + rc = -EBUSY; goto out; } devm_cxl_dpa_release(cxled); @@ -315,7 +317,7 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; - int rc = -EBUSY; + int rc; switch (mode) { case CXL_DECODER_RAM: @@ -327,19 +329,23 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, } down_write(&cxl_dpa_rwsem); - if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { + rc = -EBUSY; goto out; + } + /* * Only allow modes that are supported by the current partition * configuration */ - rc = -ENXIO; if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { dev_dbg(dev, "no available pmem capacity\n"); + rc = -ENXIO; goto out; } if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { dev_dbg(dev, "no available ram capacity\n"); + rc = -ENXIO; goto out; } @@ -360,11 +366,12 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) struct device *dev = &cxled->cxld.dev; resource_size_t start, avail, skip; struct resource *p, *last; - int rc = -EBUSY; + int rc; down_write(&cxl_dpa_rwsem); if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) { dev_dbg(dev, "decoder enabled\n"); + rc = -EBUSY; goto out; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 130989846cce..feed86737202 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -215,7 +215,7 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at char *buf) { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - u64 base = cxl_dpa_resource(cxled); + u64 base = cxl_dpa_resource_start(cxled); return sysfs_emit(buf, "%#llx\n", base); }