On Tue, Dec 31, 2024 at 8:32 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > >> goto discard; > >> > > > > I agree that this is necessary, but I'm not sure it addresses my > > concerns. MADV_FREE'ed mTHPs are still being added to `deferred_split`, > > and this does not resolve the issue of them being partially unmapped > > though it is definitely better than the existing code, at least folios are > > not moved back to swap-backed. > > > On the other hand, users might rely on the `deferred_split` counter to > > assess how aggressively userspace is performing address/size unaligned > > operations > > like MADV_DONTNEED or unmapped behavior. However, our debugging shows > > that the majority of `deferred_split` counter increments result from > > aligned MADV_FREE operations. This diminishes the counter's usefulness > > in reflecting unaligned userspace behavior. > > Optimizing that is certainly something to look into, but the bigger > issue you describe arises from bad handling of speculative references. > > Just imagine you indeed have a partially-mapped anon folio and the > remaining pages are MADV_FREE'ed. The problem with the speculative > reference would still apply. > > > > > If possible, I am still looking for some approach to entirely avoid > > adding the folio to deferred_split and partially being unmapped. > > > > Could the concept be something like this? > > Very likely it's wrong, because you really have to assure that that > folio range is mapped here. > > Proper folio PTE batching should be applied here -- folio_pte_batch() etc. > I agree that using `folio_pte_batch()` to check if all PTEs are mapped and determining `any_dirty` for setting swap-backed is the right direction. I'm just curious if `(!list_empty(&folio->_deferred_list))` or `folio_test_partially_mapped(folio)` could replace it if we're aiming for a smaller change :-) > That can please the counters in many, but not all cases. Again, maybe > the deferred-split handling should be handled differently, and not > synchronously from rmap code. > > I see 3 different work items > > 1) Fix mis-handling of speculative references > Agreed, the patch you're sending is absolutely necessary. I'd prefer it lands sooner in some way. Would you like to post it? > 2) Perform proper PTE batching during unmap/migration. Will improve > performance in any case. Agreed. I remember discussing this with Ryan in an email thread about a year ago, even for normal (non-MADV_FREE'ed) folios, but it seems everyone has been busy with other priorities. This seems like a good time to start exploring the idea. We could begin with MADV_FREE'ed folios and later extend it to normal folios—for instance, by implementing batched setting of swap entries. > > 3) Try moving deferred-split handling out of rmap code into reclaim/ > access-bit handling. I'm not quite sure we still need this after having 1 and 2. With those, we've been able to operate on the mTHP as a whole. Do we still need to move deferred_split out of rmap? > > -- > Cheers, > > David / dhildenb > Thanks Barry