On Tue, Jan 30, 2024 at 5:32 AM Chris Li <chrisl@xxxxxxxxxx> wrote: > > On Mon, Jan 29, 2024 at 2:07 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > > > On 29.01.24 04:25, Chris Li wrote: > > > Hi David and Barry, > > > > > > On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > > >> > > >>> > > >>> > > >>> I have on my todo list to move all that !anon handling out of > > >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add > > >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag > > >>> then (-> whole new folio exclusive). > > >>> > > >>> That's the cleaner approach. > > >>> > > >> > > >> one tricky thing is that sometimes it is hard to know who is the first > > >> one to add rmap and thus should > > >> call folio_add_new_anon_rmap. > > >> especially when we want to support swapin_readahead(), the one who > > >> allocated large filio might not > > >> be that one who firstly does rmap. > > > > > > I think Barry has a point. Two tasks might race to swap in the folio > > > then race to perform the rmap. > > > folio_add_new_anon_rmap() should only call a folio that is absolutely > > > "new", not shared. The sharing in swap cache disqualifies that > > > condition. > > > > We have to hold the folio lock. So only one task at a time might do the > > folio_add_anon_rmap_ptes() right now, and the > > folio_add_new_shared_anon_rmap() in the future [below]. > > > > Ah, I see. The folio_lock() is the answer I am looking for. > > > Also observe how folio_add_anon_rmap_ptes() states that one must hold > > the page lock, because otherwise this would all be completely racy. > > > > From the pte swp exclusive flags, we know for sure whether we are > > dealing with exclusive vs. shared. I think patch #6 does not properly > > check that all entries are actually the same in that regard (all > > exclusive vs all shared). That likely needs fixing. > > > > [I have converting per-page PageAnonExclusive flags to a single > > per-folio flag on my todo list. I suspect that we'll keep the > > per-swp-pte exlusive bits, but the question is rather what we can > > actually make work, because swap and migration just make it much more > > complicated. Anyhow, future work] > > > > > > > >> is it an acceptable way to do the below in do_swap_page? > > >> if (!folio_test_anon(folio)) > > >> folio_add_new_anon_rmap() > > >> else > > >> folio_add_anon_rmap_ptes() > > > > > > I am curious to know the answer as well. > > > > > > Yes, the end code should likely be something like: > > > > /* ksm created a completely new copy */ > > if (unlikely(folio != swapcache && swapcache)) { > > folio_add_new_anon_rmap(folio, vma, vmf->address); > > folio_add_lru_vma(folio, vma); > > } else if (folio_test_anon(folio)) { > > folio_add_anon_rmap_ptes(rmap_flags) > > } else { > > folio_add_new_anon_rmap(rmap_flags) > > } > > > > Maybe we want to avoid teaching all existing folio_add_new_anon_rmap() > > callers about a new flag, and just have a new > > folio_add_new_shared_anon_rmap() instead. TBD. if we have to add a wrapper like folio_add_new_shared_anon_rmap() to avoid "if (folio_test_anon(folio))" and "else" everywhere, why not we just do it in folio_add_anon_rmap_ptes() ? folio_add_anon_rmap_ptes() { if (!folio_test_anon(folio)) return folio_add_new_anon_rmap(); } Anyway, I am going to change the patch 4/6 to if(folio_test_anon)/else first and drop this 5/6. we may figure out if we need a wrapper later. > > There is more than one caller needed to perform that dance around > folio_test_anon() then decide which function to call. It would be nice > to have a wrapper function folio_add_new_shared_anon_rmap() to > abstract this behavior. > > > > > > > > > > BTW, that test might have a race as well. By the time the task got > > > !anon result, this result might get changed by another task. We need > > > to make sure in the caller context this race can't happen. Otherwise > > > we can't do the above safely. > > Again, folio lock. Observe the folio_lock_or_retry() call that covers > > our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls. > > Ack. Thanks for the explanation. > > Chris Thanks Barry