Jonathan Cameron wrote: > On Thu, 23 Jun 2022 21:19:44 -0700 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > CXL regions (interleave sets) are made up of a set of memory devices > > where each device maps a portion of the interleave with one of its > > decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). > > As endpoint decoders are identified by a provisioning tool they can be > > added to a region provided the region interleave properties are set > > (way, granularity, HPA) and DPA has been assigned to the decoder. > > > > The attach event triggers several validation checks, for example: > > - is the DPA sized appropriately for the region > > - is the decoder reachable via the host-bridges identified by the > > region's root decoder > > - is the device already active in a different region position slot > > - are there already regions with a higher HPA active on a given port > > (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) > > > > ...and the attach event affords an opportunity to collect data and > > resources relevant to later programming the target lists in switch > > decoders, for example: > > - allocate a decoder at each cxl_port in the decode chain > > - for a given switch port, how many the region's endpoints are hosted > > through the port > > - how many unique targets (next hops) does a port need to map to reach > > those endpoints > > > > The act of reconciling this information and deploying it to the decoder > > configuration is saved for a follow-on patch. > > > > Co-developed-by: Ben Widawsky <bwidawsk@xxxxxxxxxx> > > Signed-off-by: Ben Widawsky <bwidawsk@xxxxxxxxxx> > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > --- > > drivers/cxl/core/core.h | 7 + > > drivers/cxl/core/port.c | 10 +- > > drivers/cxl/core/region.c | 338 +++++++++++++++++++++++++++++++++++++- > > drivers/cxl/cxl.h | 20 +++ > > drivers/cxl/cxlmem.h | 5 + > > 5 files changed, 372 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > > index 4830365f3857..65bf84abad57 100644 > > --- a/drivers/cxl/core/region.c > > +++ b/drivers/cxl/core/region.c > > @@ -428,6 +428,254 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) > > return rc; > > } > > > > > + > > +static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port, > > + struct cxl_region *cxlr) > > +{ > > + struct cxl_region_ref *cxl_rr; > > + > > + cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL); > > + if (!cxl_rr) > > + return NULL; > > + cxl_rr->port = port; > > + cxl_rr->region = cxlr; > > + xa_init(&cxl_rr->endpoints); > > + return cxl_rr; > > +} > > + > > +static void free_region_ref(struct cxl_region_ref *cxl_rr) > > +{ > > + struct cxl_port *port = cxl_rr->port; > > + struct cxl_region *cxlr = cxl_rr->region; > > + struct cxl_decoder *cxld = cxl_rr->decoder; > > + > > + dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n"); > > + if (cxld->region == cxlr) { > > + cxld->region = NULL; > > + put_device(&cxlr->dev); > > + } > > + > > + xa_erase(&port->regions, (unsigned long)cxlr); > > Why do we have things in a free_ function that aren't simply removing things > created in the alloc()? I'd kind of expect this to be in a cxl_rr_del() or similar. Fixed it the other way by just open-coding cxl_rr_add() into alloc_region_ref(). There was no good reason to have them as separate steps. > > > + xa_destroy(&cxl_rr->endpoints); > > + kfree(cxl_rr); > > +} > > + > > +static int cxl_rr_add(struct cxl_region_ref *cxl_rr) > > +{ > > + struct cxl_port *port = cxl_rr->port; > > + struct cxl_region *cxlr = cxl_rr->region; > > + > > + return xa_insert(&port->regions, (unsigned long)cxlr, cxl_rr, > > + GFP_KERNEL); > > +} > > + > > +static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr, > > + struct cxl_endpoint_decoder *cxled) > > +{ > > + int rc; > > + struct cxl_port *port = cxl_rr->port; > > + struct cxl_region *cxlr = cxl_rr->region; > > + struct cxl_decoder *cxld = cxl_rr->decoder; > > + struct cxl_ep *ep = cxl_ep_load(port, cxled_to_memdev(cxled)); > > + > > + rc = xa_insert(&cxl_rr->endpoints, (unsigned long)cxled, ep, > > + GFP_KERNEL); > > + if (rc) > > + return rc; > > + cxl_rr->nr_eps++; > > + > > + if (!cxld->region) { > > + cxld->region = cxlr; > > + get_device(&cxlr->dev); > > + } > > + > > + return 0; > > +} > > + > > +static int cxl_port_attach_region(struct cxl_port *port, > > + struct cxl_region *cxlr, > > + struct cxl_endpoint_decoder *cxled, int pos) > > +{ > > + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > + struct cxl_ep *ep = cxl_ep_load(port, cxlmd); > > + struct cxl_region_ref *cxl_rr = NULL, *iter; > > + struct cxl_region_params *p = &cxlr->params; > > + struct cxl_decoder *cxld = NULL; > > + unsigned long index; > > + int rc = -EBUSY; > > + > > + lockdep_assert_held_write(&cxl_region_rwsem); > > This function is complex enough that maybe it would benefit from > some saying what each part is doing. ...added a kdoc block: /** * cxl_port_attach_region() - track a region's interest in a port by endpoint * @port: port to add a new region reference 'struct cxl_region_ref' * @cxlr: region to attach to @port * @cxled: endpoint decoder used to create or further pin a region reference * @pos: interleave position of @cxled in @cxlr * * The attach event is an opportunity to validate CXL decode setup * constraints and record metadata needed for programming HDM decoders, * in particular decoder target lists. * * The steps are: * - validate that there are no other regions with a higher HPA already * associated with @port * - establish a region reference if one is not already present * - additionally allocate a decoder instance that will host @cxlr on * @port * - pin the region reference by the endpoint * - account for how many entries in @port's target list are needed to * cover all of the added endpoints. */ > > > + > > + xa_for_each(&port->regions, index, iter) { > > + struct cxl_region_params *ip = &iter->region->params; > > + > > + if (iter->region == cxlr) > > + cxl_rr = iter; > > + if (ip->res->start > p->res->start) { > > + dev_dbg(&cxlr->dev, > > + "%s: HPA order violation %s:%pr vs %pr\n", > > + dev_name(&port->dev), > > + dev_name(&iter->region->dev), ip->res, p->res); > > + return -EBUSY; > > + } > > + } > > + > > + if (cxl_rr) { > > + struct cxl_ep *ep_iter; > > + int found = 0; > > + > > + cxld = cxl_rr->decoder; > > + xa_for_each(&cxl_rr->endpoints, index, ep_iter) { > > + if (ep_iter == ep) > > + continue; > > + if (ep_iter->next == ep->next) { > > + found++; > > + break; > > + } > > + } > > + > > + /* > > + * If this is a new target or if this port is direct connected > > + * to this endpoint then add to the target count. > > + */ > > + if (!found || !ep->next) > > + cxl_rr->nr_targets++; > > + } else { > > + cxl_rr = alloc_region_ref(port, cxlr); > > + if (!cxl_rr) { > > + dev_dbg(&cxlr->dev, > > + "%s: failed to allocate region reference\n", > > + dev_name(&port->dev)); > > + return -ENOMEM; > > + } > > + rc = cxl_rr_add(cxl_rr); > > + if (rc) { > > + dev_dbg(&cxlr->dev, > > + "%s: failed to track region reference\n", > > + dev_name(&port->dev)); > > + kfree(cxl_rr); > > + return rc; > > + } > > + } > > + > > + if (!cxld) { > > + if (port == cxled_to_port(cxled)) > > + cxld = &cxled->cxld; > > + else > > + cxld = cxl_region_find_decoder(port, cxlr); > > + if (!cxld) { > > + dev_dbg(&cxlr->dev, "%s: no decoder available\n", > > + dev_name(&port->dev)); > > + goto out_erase; > > + } > > + > > + if (cxld->region) { > > + dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n", > > + dev_name(&port->dev), dev_name(&cxld->dev), > > + dev_name(&cxld->region->dev)); > > + rc = -EBUSY; > > + goto out_erase; > > + } > > + > > + cxl_rr->decoder = cxld; > > + } > > + > > + rc = cxl_rr_ep_add(cxl_rr, cxled); > > + if (rc) { > > + dev_dbg(&cxlr->dev, > > + "%s: failed to track endpoint %s:%s reference\n", > > + dev_name(&port->dev), dev_name(&cxlmd->dev), > > + dev_name(&cxld->dev)); > > + goto out_erase; > > + } > > + > > + return 0; > > +out_erase: > > + if (cxl_rr->nr_eps == 0) > > + free_region_ref(cxl_rr); > > + return rc; > > +} > > + > > > > > static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) > > { > > + struct cxl_port *iter, *ep_port = cxled_to_port(cxled); > > struct cxl_region *cxlr = cxled->cxld.region; > > struct cxl_region_params *p; > > > > @@ -481,6 +811,10 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) > > p = &cxlr->params; > > get_device(&cxlr->dev); > > > > + for (iter = ep_port; !is_cxl_root(iter); > > + iter = to_cxl_port(iter->dev.parent)) > > + cxl_port_detach_region(iter, cxlr, cxled); > > + > > if (cxled->pos < 0 || cxled->pos >= p->interleave_ways || > > p->targets[cxled->pos] != cxled) { > > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > @@ -491,6 +825,8 @@ static void cxl_region_detach(struct cxl_endpoint_decoder *cxled) > > goto out; > > } > > > > + if (p->state == CXL_CONFIG_ACTIVE) > > I 'think' the state is either CXL_CONFIG_ACTIVE or CXL_CONFIG_INTERLEAVE_ACTIVE, > so you could set this unconditionally. A comment here on permissible > states would be useful for future reference. cxl_region_detach() should not care if the region state is idle. Not that it will happen in the current code, but the only expectation is that if the region is active and a endpoint departs it must be downgraded in config state. CXL_CONFIG_IDLE is permissible, although not expected. I do not think a comment is needed if the "if (p->state == CXL_CONFIG_ACTIVE)" check stays. > > > + p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; > > p->targets[cxled->pos] = NULL; > > p->nr_targets--; > >