On 13.01.22 18:14, Linus Torvalds wrote: > On Thu, Jan 13, 2022 at 8:48 AM David Hildenbrand <david@xxxxxxxxxx> wrote: >> >> I'm wondering if we can get rid of the mapcount checks in >> reuse_swap_page() and instead check for page_count() and swapcount only. > > Honestly, I think even checking page_count() is pointless. > > If the page has users, then that's fine. That's irrelevant for whether > it's a swap page or not, no? > > So if you want to remove it from the swap cache, all that matters is that > > (a) you have it locked so that there can be no new users trying to mix it up > > (b) there are no swapcount() users of this page (which don't have a > ref to the page itself, they only have a swap entry), so that you > can't have somebody trying to look it up (whether some racy > "concurrent" lookup _or_ any later one, since we're about to remove > the swap cache association). > > Why would "map_count()" matter - it's now many times the *page* is > mapped, it's irrelevant to swap cache status? And for the same reason, > what difference does "page_count()" have? > > One big reason I did the COW rewrite was literally that the rules were > pure voodoo and made no sense at all. There was no underlying logic, > it was just a random collection of random tests that didn't have any > logical architecture to them. > > Our VM is really really complicated already, so I really want our code > to make sense. > > So even if I'm entirely wrong in my swap/map-count arguments above, > I'd like whatever patches in this area to be really well commented and > have some fundamental rules and logic to them so that people can read > the code and go "Ok, makes sense". > > Please? I might be missing something, but it's not only about whether we can remove the page from the swap cache, it's about whether we can reuse the page exclusively in a process with write access, avoiding a COW. And for that we have to check if it's mapped somewhere else already (readable). Here is a sketch (buggy, untested, uncompiled) of how I think reuse_swap_page() could look like, removing any mapcount logic and incorporating the refount, leaving the page_trans_huge_swapcount() changes etc. out of the picture. Would that make any sense? bool reuse_swap_page(struct page *page, bool mapped) { int swapcount = 0, total_swapcount; VM_BUG_ON_PAGE(!PageLocked(page), page); if (unlikely(PageKsm(page))) return false; if (PageSwapCache(page)) { swapcount = page_trans_huge_swapcount(page, &total_swapcount); if (swapcount == 1 && !mapped && (likely(!PageTransCompound(page)) || /* The remaining swap count will be freed soon */ total_swapcount == page_swapcount(page))) { if (!PageWriteback(page)) { page = compound_head(page); delete_from_swap_cache(page); SetPageDirty(page); } else { swp_entry_t entry; struct swap_info_struct *p; entry.val = page_private(page); p = swap_info_get(entry); if (p->flags & SWP_STABLE_WRITES) { spin_unlock(&p->lock); return false; } spin_unlock(&p->lock); } } } /* * We expect exactly one ref from our mapped page (if already mapped) * and one ref from the swapcache (if in the swapcache). */ if (!mapped) return swapcount == 1 && page_count(page) == !!PageSwapCache(page) return swapcount == 0 && page_count(page) == 1 + !!PageSwapCache(page) } -- Thanks, David / dhildenb