On Mon, 21 Oct 2024 13:45:57 -0500 Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > Jonathan Cameron wrote: > > On Thu, 17 Oct 2024 16:39:57 -0500 > > Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > > > > Jonathan Cameron wrote: > > > > On Mon, 07 Oct 2024 18:16:27 -0500 > > > > ira.weiny@xxxxxxxxx wrote: > > > > > > [snip] > > > > > > Simplify extent tracking with the following restrictions. > > > > > > > > > > 1) Flag for removal any extent which overlaps a requested > > > > > release range. > > > > > 2) Refuse the offer of extents which overlap already accepted > > > > > memory ranges. > > > > > 3) Accept again a range which has already been accepted by the > > > > > host. Eating duplicates serves three purposes. First, this > > > > > simplifies the code if the device should get out of sync with > > > > > the host. > > > > > > > > Maybe scream about this a little. AFAIK that happening is a device > > > > bug. > > > > > > Agreed but because of the 2nd purpose this is difficult to scream about because > > > this situation can come up in normal operation. Here is the scenario: > > > > > > 1) Device has 2 DCD partitions active, A and B > > > 2) Host crashes > > > 3) Region X is created on A > > > 4) Region Y is created on B > > > 5) Region Y scans for extents > > > 6) Region X surfaces a new extent while Y is scanning > > > 7) Gen number changes due to new extent in X > > > 8) Region Y rescans for existing extents and sees duplicates. > > > > > > These duplicates need to be ignored without signaling an error. > > Hmm. If we can know that path is the trigger (should be able to > > as it's a scan after a gen number change), can we just muffle the > > screams on that path? (Halloween is close, the analogies will get > > ever worse :) > > Ok yea since this would be a device error we should do something here. But the > code is going to be somewhat convoluted to print an error whenever this > happens. > > What if we make this a warning and change the rescan debug message to a warning > as well? This would allow enough bread crumbs to determine if a device is > failing without a lot of extra code to alter print messages on the fly? Sounds ok to me. Jonathan > > Ira >