Jonathan Cameron wrote: > On Mon, 07 Oct 2024 18:16:27 -0500 > ira.weiny@xxxxxxxxx wrote: > > > From: Navneet Singh <navneet.singh@xxxxxxxxx> > > > > A dynamic capacity device (DCD) sends events to signal the host for > > changes in the availability of Dynamic Capacity (DC) memory. These > > events contain extents describing a DPA range and meta data for memory > > to be added or removed. Events may be sent from the device at any time. > > > > Three types of events can be signaled, Add, Release, and Force Release. > > > > On add, the host may accept or reject the memory being offered. If no > > region exists, or the extent is invalid, the extent should be rejected. > > Add extent events may be grouped by a 'more' bit which indicates those > > extents should be processed as a group. > > > > On remove, the host can delay the response until the host is safely not > > using the memory. If no region exists the release can be sent > > immediately. The host may also release extents (or partial extents) at > > any time. Thus the 'more' bit grouping of release events is of less > > value and can be ignored in favor of sending multiple release capacity > > responses for groups of release events. > > True today - I think that would be an error for shared extents > though as they need to be released in one go. We can deal with > that when it matters. > > > Mind you patch seems to try to handle more bit anyway, so maybe just > remove that discussion from this description? It only handles more bit response on ADD because on RELEASE the count is always 1. + if (cxl_send_dc_response(mds, CXL_MBOX_OP_RELEASE_DC, &extent_list, 1)) + dev_dbg(dev, "Failed to release [range 0x%016llx-0x%016llx]\n", + range->start, range->end); For shared; a flag will need to be added to the extents and additional logic to group these extents for checking use etc. I agree, we need to handle that later on and get this basic support in. For now I think my comments are correct WRT the sending of release responses. > > > > Simplify extent tracking with the following restrictions. > > > > 1) Flag for removal any extent which overlaps a requested > > release range. > > 2) Refuse the offer of extents which overlap already accepted > > memory ranges. > > 3) Accept again a range which has already been accepted by the > > host. Eating duplicates serves three purposes. First, this > > simplifies the code if the device should get out of sync with > > the host. > > Maybe scream about this a little. AFAIK that happening is a device > bug. Agreed but because of the 2nd purpose this is difficult to scream about because this situation can come up in normal operation. Here is the scenario: 1) Device has 2 DCD partitions active, A and B 2) Host crashes 3) Region X is created on A 4) Region Y is created on B 5) Region Y scans for extents 6) Region X surfaces a new extent while Y is scanning 7) Gen number changes due to new extent in X 8) Region Y rescans for existing extents and sees duplicates. These duplicates need to be ignored without signaling an error. > > > And it should be safe to acknowledge the extent > > again. Second, this simplifies the code to process existing > > extents if the extent list should change while the extent > > list is being read. This is the 'normal' case. > > Third, duplicates for a given region > > which are seen during a race between the hardware surfacing > > an extent and the cxl dax driver scanning for existing > > extents will be ignored. > > This last one is a good justification. I think the second justification is actually better than this one. Regardless this makes everything ok and should work. > > > > > NOTE: Processing existing extents is done in a later patch. > > > > Management of the region extent devices must be synchronized with > > potential uses of the memory within the DAX layer. Create region extent > > devices as children of the cxl_dax_region device such that the DAX > > region driver can co-drive them and synchronize with the DAX layer. > > Synchronization and management is handled in a subsequent patch. > > > > Tag support within the DAX layer is not yet supported. To maintain > > compatibility legacy DAX/region processing only tags with a value of 0 > > are allowed. This defines existing DAX devices as having a 0 tag which > > makes the most logical sense as a default. > > > > Process DCD events and create region devices. > > > > Signed-off-by: Navneet Singh <navneet.singh@xxxxxxxxx> > > Co-developed-by: Ira Weiny <ira.weiny@xxxxxxxxx> > > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx> > > > A couple of minor comments from me. I do appreciate the review. [snip] > > > > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode, > > + struct xarray *extent_array, int cnt) > > +{ > > + struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox; > > + struct cxl_mbox_dc_response *p; > > + struct cxl_mbox_cmd mbox_cmd; > > + struct cxl_extent *extent; > > + unsigned long index; > > + u32 pl_index; > > + int rc; > > + > > + size_t pl_size = struct_size(p, extent_list, cnt); > > + u32 max_extents = cnt; > > + > > + /* May have to use more bit on response. */ > > I thought you argued in the patch description that it didn't matter if you > didn't set it? Only on RELEASE responses. ADD responses might need it depending on the payload size and number of extents being added. Sorry that was not clear. > > > + if (pl_size > cxl_mbox->payload_size) { > > + max_extents = (cxl_mbox->payload_size - sizeof(*p)) / > > + sizeof(struct updated_extent_list); > > + pl_size = struct_size(p, extent_list, max_extents); > > + } > > + > > + struct cxl_mbox_dc_response *response __free(kfree) = > > + kzalloc(pl_size, GFP_KERNEL); > > + if (!response) > > + return -ENOMEM; > > + > > + pl_index = 0; > > + xa_for_each(extent_array, index, extent) { > > + > > + response->extent_list[pl_index].dpa_start = extent->start_dpa; > > + response->extent_list[pl_index].length = extent->length; > > + pl_index++; > > + response->extent_list_size = cpu_to_le32(pl_index); > > + > > + if (pl_index == max_extents) { > > + mbox_cmd = (struct cxl_mbox_cmd) { > > + .opcode = opcode, > > + .size_in = struct_size(response, extent_list, > > + pl_index), > > + .payload_in = response, > > + }; > > + > > + response->flags = 0; > > + if (pl_index < cnt) > > + response->flags &= CXL_DCD_EVENT_MORE; > Covered in other branch of thread. Yep. [snip] > > > > > +/* See CXL 3.0 8.2.9.2.1.5 */ > > Maybe update to 3.1? Otherwise patch reviewer needs to open two > spec versions! In 3.1 it is 8.2.9.2.1.6 Yep missed this one. Thanks, Ira