On Mon, Jan 29, 2024 at 11:07 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 29.01.24 04:25, Chris Li wrote: > > Hi David and Barry, > > > > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > >> > >>> > >>> > >>> I have on my todo list to move all that !anon handling out of > >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add > >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag > >>> then (-> whole new folio exclusive). > >>> > >>> That's the cleaner approach. > >>> > >> > >> one tricky thing is that sometimes it is hard to know who is the first > >> one to add rmap and thus should > >> call folio_add_new_anon_rmap. > >> especially when we want to support swapin_readahead(), the one who > >> allocated large filio might not > >> be that one who firstly does rmap. > > > > I think Barry has a point. Two tasks might race to swap in the folio > > then race to perform the rmap. > > folio_add_new_anon_rmap() should only call a folio that is absolutely > > "new", not shared. The sharing in swap cache disqualifies that > > condition. > > We have to hold the folio lock. So only one task at a time might do the > folio_add_anon_rmap_ptes() right now, and the > folio_add_new_shared_anon_rmap() in the future [below]. > > Also observe how folio_add_anon_rmap_ptes() states that one must hold > the page lock, because otherwise this would all be completely racy. > > From the pte swp exclusive flags, we know for sure whether we are > dealing with exclusive vs. shared. I think patch #6 does not properly > check that all entries are actually the same in that regard (all > exclusive vs all shared). That likely needs fixing. > > [I have converting per-page PageAnonExclusive flags to a single > per-folio flag on my todo list. I suspect that we'll keep the > per-swp-pte exlusive bits, but the question is rather what we can > actually make work, because swap and migration just make it much more > complicated. Anyhow, future work] > > > > >> is it an acceptable way to do the below in do_swap_page? > >> if (!folio_test_anon(folio)) > >> folio_add_new_anon_rmap() > >> else > >> folio_add_anon_rmap_ptes() > > > > I am curious to know the answer as well. > > > Yes, the end code should likely be something like: > > /* ksm created a completely new copy */ > if (unlikely(folio != swapcache && swapcache)) { > folio_add_new_anon_rmap(folio, vma, vmf->address); > folio_add_lru_vma(folio, vma); > } else if (folio_test_anon(folio)) { > folio_add_anon_rmap_ptes(rmap_flags) > } else { > folio_add_new_anon_rmap(rmap_flags) > } > > Maybe we want to avoid teaching all existing folio_add_new_anon_rmap() > callers about a new flag, and just have a new > folio_add_new_shared_anon_rmap() instead. TBD. right. We need to clarify that the new anon_folio might not necessarily be exclusive. Unlike folio_add_new_anon_rmap, which assumes the new folio is exclusive, folio_add_anon_rmap_ptes is capable of handling both exclusive and non-exclusive new anon folios. The code would be like: if (unlikely(folio != swapcache && swapcache)) { folio_add_new_anon_rmap(folio, vma, vmf->address); folio_add_lru_vma(folio, vma); } else if (!folio_test_anon(folio)) { folio_add_anon_rmap_ptes(rmap_flags); } else { if (exclusive) folio_add_new_anon_rmap(); else folio_add_new_shared_anon_rmap(); } It appears a bit lengthy? > > > > > BTW, that test might have a race as well. By the time the task got > > !anon result, this result might get changed by another task. We need > > to make sure in the caller context this race can't happen. Otherwise > > we can't do the above safely. > Again, folio lock. Observe the folio_lock_or_retry() call that covers > our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls. > > -- > Cheers, > > David / dhildenb Thanks Barry