On Thu, 17 Oct 2024 16:39:57 -0500 Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > Jonathan Cameron wrote: > > On Mon, 07 Oct 2024 18:16:27 -0500 > > ira.weiny@xxxxxxxxx wrote: > > > > > From: Navneet Singh <navneet.singh@xxxxxxxxx> > > > > > > A dynamic capacity device (DCD) sends events to signal the host for > > > changes in the availability of Dynamic Capacity (DC) memory. These > > > events contain extents describing a DPA range and meta data for memory > > > to be added or removed. Events may be sent from the device at any time. > > > > > > Three types of events can be signaled, Add, Release, and Force Release. > > > > > > On add, the host may accept or reject the memory being offered. If no > > > region exists, or the extent is invalid, the extent should be rejected. > > > Add extent events may be grouped by a 'more' bit which indicates those > > > extents should be processed as a group. > > > > > > On remove, the host can delay the response until the host is safely not > > > using the memory. If no region exists the release can be sent > > > immediately. The host may also release extents (or partial extents) at > > > any time. Thus the 'more' bit grouping of release events is of less > > > value and can be ignored in favor of sending multiple release capacity > > > responses for groups of release events. > > > > True today - I think that would be an error for shared extents > > though as they need to be released in one go. We can deal with > > that when it matters. > > > > > > Mind you patch seems to try to handle more bit anyway, so maybe just > > remove that discussion from this description? > > It only handles more bit response on ADD because on RELEASE the count is always > 1. > > > + if (cxl_send_dc_response(mds, CXL_MBOX_OP_RELEASE_DC, &extent_list, 1)) > + dev_dbg(dev, "Failed to release [range 0x%016llx-0x%016llx]\n", > + range->start, range->end); > > > For shared; a flag will need to be added to the extents and additional logic to > group these extents for checking use etc. > > I agree, we need to handle that later on and get this basic support in. For > now I think my comments are correct WRT the sending of release responses. > > > > > > > Simplify extent tracking with the following restrictions. > > > > > > 1) Flag for removal any extent which overlaps a requested > > > release range. > > > 2) Refuse the offer of extents which overlap already accepted > > > memory ranges. > > > 3) Accept again a range which has already been accepted by the > > > host. Eating duplicates serves three purposes. First, this > > > simplifies the code if the device should get out of sync with > > > the host. > > > > Maybe scream about this a little. AFAIK that happening is a device > > bug. > > Agreed but because of the 2nd purpose this is difficult to scream about because > this situation can come up in normal operation. Here is the scenario: > > 1) Device has 2 DCD partitions active, A and B > 2) Host crashes > 3) Region X is created on A > 4) Region Y is created on B > 5) Region Y scans for extents > 6) Region X surfaces a new extent while Y is scanning > 7) Gen number changes due to new extent in X > 8) Region Y rescans for existing extents and sees duplicates. > > These duplicates need to be ignored without signaling an error. Hmm. If we can know that path is the trigger (should be able to as it's a scan after a gen number change), can we just muffle the screams on that path? (Halloween is close, the analogies will get ever worse :) Jonathan