On 10/8/2024 7:16 AM, ira.weiny@xxxxxxxxx wrote: > From: Navneet Singh <navneet.singh@xxxxxxxxx> > > A dynamic capacity device (DCD) sends events to signal the host for > changes in the availability of Dynamic Capacity (DC) memory. These > events contain extents describing a DPA range and meta data for memory > to be added or removed. Events may be sent from the device at any time. > > Three types of events can be signaled, Add, Release, and Force Release. > > On add, the host may accept or reject the memory being offered. If no > region exists, or the extent is invalid, the extent should be rejected. > Add extent events may be grouped by a 'more' bit which indicates those > extents should be processed as a group. > > On remove, the host can delay the response until the host is safely not > using the memory. If no region exists the release can be sent > immediately. The host may also release extents (or partial extents) at > any time. Thus the 'more' bit grouping of release events is of less > value and can be ignored in favor of sending multiple release capacity > responses for groups of release events. > > Force removal is intended as a mechanism between the FM and the device > and intended only when the host is unresponsive, out of sync, or > otherwise broken. Purposely ignore force removal events. > > Regions are made up of one or more devices which may be surfacing memory > to the host. Once all devices in a region have surfaced an extent the > region can expose a corresponding extent for the user to consume. > Without interleaving a device extent forms a 1:1 relationship with the > region extent. Immediately surface a region extent upon getting a > device extent. > > Per the specification the device is allowed to offer or remove extents > at any time. However, anticipated use cases can expect extents to be > offered, accepted, and removed in well defined chunks. > > Simplify extent tracking with the following restrictions. > > 1) Flag for removal any extent which overlaps a requested > release range. > 2) Refuse the offer of extents which overlap already accepted > memory ranges. > 3) Accept again a range which has already been accepted by the > host. Eating duplicates serves three purposes. First, this > simplifies the code if the device should get out of sync with > the host. And it should be safe to acknowledge the extent > again. Second, this simplifies the code to process existing > extents if the extent list should change while the extent > list is being read. Third, duplicates for a given region > which are seen during a race between the hardware surfacing > an extent and the cxl dax driver scanning for existing > extents will be ignored. > > NOTE: Processing existing extents is done in a later patch. > > Management of the region extent devices must be synchronized with > potential uses of the memory within the DAX layer. Create region extent > devices as children of the cxl_dax_region device such that the DAX > region driver can co-drive them and synchronize with the DAX layer. > Synchronization and management is handled in a subsequent patch. > > Tag support within the DAX layer is not yet supported. To maintain > compatibility legacy DAX/region processing only tags with a value of 0 > are allowed. This defines existing DAX devices as having a 0 tag which > makes the most logical sense as a default. > > Process DCD events and create region devices. > > Signed-off-by: Navneet Singh <navneet.singh@xxxxxxxxx> > Co-developed-by: Ira Weiny <ira.weiny@xxxxxxxxx> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx> > Hi Ira, I guess you missed my comments for V3, I comment it again for this patch. > +static bool extents_contain(struct cxl_dax_region *cxlr_dax, > + struct cxl_endpoint_decoder *cxled, > + struct range *new_range) > +{ > + struct device *extent_device; > + struct match_data md = { > + .cxled = cxled, > + .new_range = new_range, > + }; > + > + extent_device = device_find_child(&cxlr_dax->dev, &md, match_contains); > + if (!extent_device) > + return false; > + > + put_device(extent_device); could use __free(put_device) to drop this 'put_device(extent_device)' > + return true; > +} [...] > +static bool extents_overlap(struct cxl_dax_region *cxlr_dax, > + struct cxl_endpoint_decoder *cxled, > + struct range *new_range) > +{ > + struct device *extent_device; > + struct match_data md = { > + .cxled = cxled, > + .new_range = new_range, > + }; > + > + extent_device = device_find_child(&cxlr_dax->dev, &md, match_overlaps); > + if (!extent_device) > + return false; > + > + put_device(extent_device); Same as above. > + return true; > +} > + [...] > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode, > + struct xarray *extent_array, int cnt) > +{ > + struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox; > + struct cxl_mbox_dc_response *p; > + struct cxl_mbox_cmd mbox_cmd; > + struct cxl_extent *extent; > + unsigned long index; > + u32 pl_index; > + int rc; > + > + size_t pl_size = struct_size(p, extent_list, cnt); > + u32 max_extents = cnt; > + > + /* May have to use more bit on response. */ > + if (pl_size > cxl_mbox->payload_size) { > + max_extents = (cxl_mbox->payload_size - sizeof(*p)) / > + sizeof(struct updated_extent_list); > + pl_size = struct_size(p, extent_list, max_extents); > + } > + > + struct cxl_mbox_dc_response *response __free(kfree) = > + kzalloc(pl_size, GFP_KERNEL); > + if (!response) > + return -ENOMEM; > + > + pl_index = 0; > + xa_for_each(extent_array, index, extent) { > + > + response->extent_list[pl_index].dpa_start = extent->start_dpa; > + response->extent_list[pl_index].length = extent->length; > + pl_index++; > + response->extent_list_size = cpu_to_le32(pl_index); > + > + if (pl_index == max_extents) { > + mbox_cmd = (struct cxl_mbox_cmd) { > + .opcode = opcode, > + .size_in = struct_size(response, extent_list, > + pl_index), > + .payload_in = response, > + }; > + > + response->flags = 0; > + if (pl_index < cnt) > + response->flags &= CXL_DCD_EVENT_MORE; It should be 'response->flags |= CXL_DCD_EVENT_MORE' here. Another issue is if 'cnt' is N times bigger than 'max_extents'(e,g. cnt=20, max_extents=10). all responses will be sent in this xa_for_each(), and CXL_DCD_EVENT_MORE will be set in the last response but it should not be set in these cases. > + > + rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); > + if (rc) > + return rc; > + pl_index = 0; > + } > + } > + > + if (cnt == 0 || pl_index) { > + mbox_cmd = (struct cxl_mbox_cmd) { > + .opcode = opcode, > + .size_in = struct_size(response, extent_list, > + pl_index), > + .payload_in = response, > + }; > + > + response->flags = 0; > + rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); > + if (rc) > + return rc; > + } > + > + return 0; > +} > +