On 2/16/22 1:48 AM, Peter Xu wrote: > The "details" pointer shouldn't be the token to decide whether we should skip > swap entries. For example, when the user specified details->zap_mapping==NULL, > it means the user wants to zap all the pages (including COWed pages), then we > need to look into swap entries because there can be private COWed pages that > was swapped out. Hi Peter, The changes look good, just some minor readability suggestions below: (btw, hch is going to ask you to reflow all of the commit descriptions to 72 cols, so you might as well do it in advance. :) > > Skipping some swap entries when details is non-NULL may lead to wrongly leaving > some of the swap entries while we should have zapped them. > > A reproducer of the problem: > > ===8<=== > #define _GNU_SOURCE /* See feature_test_macros(7) */ > #include <stdio.h> > #include <assert.h> > #include <unistd.h> > #include <sys/mman.h> > #include <sys/types.h> > > int page_size; > int shmem_fd; > char *buffer; > > void main(void) > { > int ret; > char val; > > page_size = getpagesize(); > shmem_fd = memfd_create("test", 0); > assert(shmem_fd >= 0); > > ret = ftruncate(shmem_fd, page_size * 2); > assert(ret == 0); > > buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE, > MAP_PRIVATE, shmem_fd, 0); > assert(buffer != MAP_FAILED); > > /* Write private page, swap it out */ > buffer[page_size] = 1; > madvise(buffer, page_size * 2, MADV_PAGEOUT); > > /* This should drop private buffer[page_size] already */ > ret = ftruncate(shmem_fd, page_size); > assert(ret == 0); > /* Recover the size */ > ret = ftruncate(shmem_fd, page_size * 2); > assert(ret == 0); > > /* Re-read the data, it should be all zero */ > val = buffer[page_size]; > if (val == 0) > printf("Good\n"); > else > printf("BUG\n"); > } > ===8<=== > > We don't need to touch up the pmd path, because pmd never had a issue with swap > entries. For example, shmem pmd migration will always be split into pte level, > and same to swapping on anonymous. > > Add another helper should_zap_cows() so that we can also check whether we > should zap private mappings when there's no page pointer specified. > > This patch drops that trick, so we handle swap ptes coherently. Meanwhile we > should do the same check upon migration entry, hwpoison entry and genuine swap > entries too. To be explicit, we should still remember to keep the private > entries if even_cows==false, and always zap them when even_cows==true. > > The issue seems to exist starting from the initial commit of git. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > --- > mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 36 insertions(+), 9 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index c125c4969913..4bfeaca7cbc7 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1313,6 +1313,17 @@ struct zap_details { > struct folio *single_folio; /* Locked folio to be unmapped */ > }; > > +/* Whether we should zap all COWed (private) pages too */ > +static inline bool should_zap_cows(struct zap_details *details) > +{ > + /* By default, zap all pages */ > + if (!details) > + return true; > + > + /* Or, we zap COWed pages only if the caller wants to */ > + return !details->zap_mapping; > +} > + > /* > * We set details->zap_mapping when we want to unmap shared but keep private > * pages. Return true if skip zapping this page, false otherwise. > @@ -1320,11 +1331,15 @@ struct zap_details { > static inline bool > zap_skip_check_mapping(struct zap_details *details, struct page *page) > { > - if (!details || !page) > + /* If we can make a decision without *page.. */ > + if (should_zap_cows(details)) > return false; > > - return details->zap_mapping && > - (details->zap_mapping != page_rmapping(page)); > + /* E.g. zero page */ It's a bit confusing to see a comment that "this could be the zero page", if the value is NULL. Maybe, "the caller passes NULL for the case of a zero page", or something along those lines? > + if (!page) > + return false; > + > + return details->zap_mapping != page_rmapping(page); > } > > static unsigned long zap_pte_range(struct mmu_gather *tlb, > @@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > continue; > } > > - /* If details->check_mapping, we leave swap entries. */ > - if (unlikely(details)) > - continue; > - > - if (!non_swap_entry(entry)) > + if (!non_swap_entry(entry)) { > + /* > + * If this is a genuine swap entry, then it must be an > + * private anon page. If the caller wants to skip > + * COWed pages, ignore it. > + */ How about this instead: /* Genuine swap entry, and therefore a private anon page. */ > + if (!should_zap_cows(details)) > + continue; > rss[MM_SWAPENTS]--; > - else if (is_migration_entry(entry)) { Can we put a newline here, and before each "else" block? Because now it is getting very dense, and the visual separation really helps. > + } else if (is_migration_entry(entry)) { > struct page *page; > > page = pfn_swap_entry_to_page(entry); > + if (zap_skip_check_mapping(details, page)) > + continue; > rss[mm_counter(page)]--; Newline here. > + } else if (is_hwpoison_entry(entry)) { > + /* If the caller wants to skip COWed pages, ignore it */ Likewise, I'd prefer we delete that comment, because it exactly matches what the following line of code says. > + if (!should_zap_cows(details)) > + continue; And newline here too. > + } else { > + /* We should have covered all the swap entry types */ > + WARN_ON_ONCE(1); > } > if (unlikely(!free_swap_and_cache(entry))) > print_bad_pte(vma, addr, ptent, NULL); Those are all just nits, and as I mentioned, the actual changes look good to me, so: Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx> thanks, -- John Hubbard NVIDIA