On Tue, Feb 1, 2022 at 6:59 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote: > > I will cut to the part that effects ABI so tool development can continue. I'll > get back to the other bits later. > > On 22-01-28 16:25:34, Dan Williams wrote: > > [snip] > > > > > > + > > > + return ret; > > > +} > > > + > > > +static size_t set_targetN(struct cxl_region *cxlr, const char *buf, int n, > > > + size_t len) > > > +{ > > > + struct device *memdev_dev; > > > + struct cxl_memdev *cxlmd; > > > + > > > + device_lock(&cxlr->dev); > > > + > > > + if (len == 1 || cxlr->config.targets[n]) > > > + remove_target(cxlr, n); > > > + > > > + /* Remove target special case */ > > > + if (len == 1) { > > > + device_unlock(&cxlr->dev); > > > + return len; > > > + } > > > + > > > + memdev_dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf); > > > > I think this wants to be an endpoint decoder, not a memdev. Because > > it's the decoder that joins a memdev to a region, or at least a > > decoder should be picked when the memdev is assigned so that the DPA > > mapping can be registered. If all the decoders are allocated then fail > > here. > > > > You've put two points in here: > > 1. Handle decoder allocation at sysfs boundary. I'll respond to this when I come > back around to the rest of the review comments. > > 2. Take a decoder for target instead of a memdev. I don't agree with this > direction as it's asymmetric to how LSA processing works. The goal was to model > the LSA for configuration. The kernel will have to be in the business of > reserving and enumerating decoders out of memdevs for both LSA (where we have a > list of memdevs) and volatile (where we use the memdevs in the system to > enumerate populated decoders). I don't see much value in making userspace do the > same. > > I'd like to ask you reconsider if you still think it's preferable to use > decoders as part of the ABI and if you still feel that way I can go change it > since it has minimal impact overall. It's more than a preference. I think there are fundamental recovery scenarios where the kernel needs userspace help to resolve decoder / DPA assignment and conflicts. PMEM interleaves behave similarly to RAID where you have multiple devices in a set that can each fail independently, and because they are hotplug capable the chances of migrating devices from one system to another are higher than PMEM devices today where hotplug is mostly non-existent. If you lurk on linux-raid long enough you will inevitably encounter someone coming to the list saying, "help a drive in my RAID array was dying. I managed to save it off, help me reassemble my array". The story often gets worse when they say "I managed to corrupt my metadata block, so I don't know what order the drives are supposed to be in". There are several breadcrumbs and trial and error steps that one takes to try to get the data back online: https://raid.wiki.kernel.org/index.php/RAID_Recovery. Now imagine that scenario with CXL where there are additional complicating factors like label-storage-area can fail independently of the data area, there are region labels with HPA fields that mandate assembly at a given address, decoders must be programmed in increasing DPA order, volatile memory and locked/fixed decoders complicate decoder selection. Considering all the ways that CXL region assembly can fail it seems inevitable that someone will get into a situation where they need to pick the decoder and the DPA to map while also not clobbering the LSA. I.e. I see a "CXL Recovery" wiki in our future. The requirements that I think fall out from that are: 1/ Region assembly needs to be possible without updating labels. So, in contrast to the way that nvdimm does it, instead of updating the region label on every attribute write there would be a commit step that realizes the current region configuration in the labels, or is ommitted in recovery scenarios where you are not ready to clobber the labels. 2/ Userspace needs the flexibility to be able to select/override which DPA gets mapped in which decoder (kernel would handle DPA skip). All the ways I can think of to augment the ABI to allow for this style of recovery devolve to just assigning decoders to regions in the first instance.