On Mon, 07 Oct 2024 18:16:27 -0500 ira.weiny@xxxxxxxxx wrote: > From: Navneet Singh <navneet.singh@xxxxxxxxx> > > A dynamic capacity device (DCD) sends events to signal the host for > changes in the availability of Dynamic Capacity (DC) memory. These > events contain extents describing a DPA range and meta data for memory > to be added or removed. Events may be sent from the device at any time. > > Three types of events can be signaled, Add, Release, and Force Release. > > On add, the host may accept or reject the memory being offered. If no > region exists, or the extent is invalid, the extent should be rejected. > Add extent events may be grouped by a 'more' bit which indicates those > extents should be processed as a group. > > On remove, the host can delay the response until the host is safely not > using the memory. If no region exists the release can be sent > immediately. The host may also release extents (or partial extents) at > any time. Thus the 'more' bit grouping of release events is of less > value and can be ignored in favor of sending multiple release capacity > responses for groups of release events. True today - I think that would be an error for shared extents though as they need to be released in one go. We can deal with that when it matters. Mind you patch seems to try to handle more bit anyway, so maybe just remove that discussion from this description? > > Simplify extent tracking with the following restrictions. > > 1) Flag for removal any extent which overlaps a requested > release range. > 2) Refuse the offer of extents which overlap already accepted > memory ranges. > 3) Accept again a range which has already been accepted by the > host. Eating duplicates serves three purposes. First, this > simplifies the code if the device should get out of sync with > the host. Maybe scream about this a little. AFAIK that happening is a device bug. > And it should be safe to acknowledge the extent > again. Second, this simplifies the code to process existing > extents if the extent list should change while the extent > list is being read. Third, duplicates for a given region > which are seen during a race between the hardware surfacing > an extent and the cxl dax driver scanning for existing > extents will be ignored. This last one is a good justification. > > NOTE: Processing existing extents is done in a later patch. > > Management of the region extent devices must be synchronized with > potential uses of the memory within the DAX layer. Create region extent > devices as children of the cxl_dax_region device such that the DAX > region driver can co-drive them and synchronize with the DAX layer. > Synchronization and management is handled in a subsequent patch. > > Tag support within the DAX layer is not yet supported. To maintain > compatibility legacy DAX/region processing only tags with a value of 0 > are allowed. This defines existing DAX devices as having a 0 tag which > makes the most logical sense as a default. > > Process DCD events and create region devices. > > Signed-off-by: Navneet Singh <navneet.singh@xxxxxxxxx> > Co-developed-by: Ira Weiny <ira.weiny@xxxxxxxxx> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx> > A couple of minor comments from me. J > diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c > new file mode 100644 > index 000000000000..69a7614ba6a9 > --- /dev/null > +++ b/drivers/cxl/core/extent.c > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > index 584d7d282a97..d66beec687a0 100644 > --- a/drivers/cxl/core/mbox.c > +++ b/drivers/cxl/core/mbox.c > @@ -889,6 +889,58 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds) > @@ -1017,6 +1069,223 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds, > return rc; > } > > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode, > + struct xarray *extent_array, int cnt) > +{ > + struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox; > + struct cxl_mbox_dc_response *p; > + struct cxl_mbox_cmd mbox_cmd; > + struct cxl_extent *extent; > + unsigned long index; > + u32 pl_index; > + int rc; > + > + size_t pl_size = struct_size(p, extent_list, cnt); > + u32 max_extents = cnt; > + > + /* May have to use more bit on response. */ I thought you argued in the patch description that it didn't matter if you didn't set it? > + if (pl_size > cxl_mbox->payload_size) { > + max_extents = (cxl_mbox->payload_size - sizeof(*p)) / > + sizeof(struct updated_extent_list); > + pl_size = struct_size(p, extent_list, max_extents); > + } > + > + struct cxl_mbox_dc_response *response __free(kfree) = > + kzalloc(pl_size, GFP_KERNEL); > + if (!response) > + return -ENOMEM; > + > + pl_index = 0; > + xa_for_each(extent_array, index, extent) { > + > + response->extent_list[pl_index].dpa_start = extent->start_dpa; > + response->extent_list[pl_index].length = extent->length; > + pl_index++; > + response->extent_list_size = cpu_to_le32(pl_index); > + > + if (pl_index == max_extents) { > + mbox_cmd = (struct cxl_mbox_cmd) { > + .opcode = opcode, > + .size_in = struct_size(response, extent_list, > + pl_index), > + .payload_in = response, > + }; > + > + response->flags = 0; > + if (pl_index < cnt) > + response->flags &= CXL_DCD_EVENT_MORE; Covered in other branch of thread. > + > + rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); > + if (rc) > + return rc; > + pl_index = 0; > + } > + } > + > + if (cnt == 0 || pl_index) { > + mbox_cmd = (struct cxl_mbox_cmd) { > + .opcode = opcode, > + .size_in = struct_size(response, extent_list, > + pl_index), > + .payload_in = response, > + }; > + > + response->flags = 0; > + rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); > + if (rc) > + return rc; > + } > + > + return 0; > +} > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index cbaacbe0f36d..b75653e9bc32 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > > +/* See CXL 3.0 8.2.9.2.1.5 */ Maybe update to 3.1? Otherwise patch reviewer needs to open two spec versions! In 3.1 it is 8.2.9.2.1.6 > +enum dc_event { > + DCD_ADD_CAPACITY, > + DCD_RELEASE_CAPACITY, > + DCD_FORCED_CAPACITY_RELEASE, > + DCD_REGION_CONFIGURATION_UPDATED, > +};