On Wed, Sep 20, 2017 at 03:37:33PM -0700, Shaohua Li wrote: > On Wed, Sep 20, 2017 at 11:01:47AM +0200, Artem Savkov wrote: > > Hi All, > > > > We recently started noticing madvise09[1] test from ltp failing strangely. The > > test does the following: maps 32 pages, sets MADV_FREE for the range it got, > > dirties 2 of the pages, creates memory pressure and check that nondirty pages > > are free. The test hanged while accessing the last 4 pages(29-32) of madvised > > range at line 121 [2]. Any other process (gdb/cat) accessing those pages > > would also hang as would rebooting the machine. It doesn't trigger any debug > > warnings or kasan. > > > > The issue bisected to "802a3a92ad7a mm: reclaim MADV_FREE pages" (so 4.12 and > > up are affected). > > > > I did some poking around and found out that the "bad" pages had SwapBacked flag > > set in shrink_page_list() which confused it a lot. It looks like > > mark_page_lazyfree() only calls lru_lazyfree_fn() when the pagevec is full > > (that is in batches of 14) and never drains the rest (so last four in madvise09 > > case). > > > > The patch below greatly reduces the failure rate, but doesn't fix it > > completely, it still shows up with the same symptoms (hanging trying to access > > last 4 pages) after a bunch of retries. > > > > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c > > [2] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c#L121 > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 21261ff0466f..a0b868e8b7d2 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -453,6 +453,7 @@ static void madvise_free_page_range(struct mmu_gather *tlb, > > > > tlb_start_vma(tlb, vma); > > walk_page_range(addr, end, &free_walk); > > + lru_add_drain(); > > tlb_end_vma(tlb, vma); > > } > > Looks there is a race between clear pte dirty bit and clear SwapBacked bit. > draining the vect helps a little, but not sufficient. If SwapBacked is set, we > could add the page to swapcache, but since we the page isn't dirty, we don't > write the page out. This could cause data corruption. There is another place we > wrongly clear SwapBacked bit. Below is a test patch which seems to fix the > issue, please give a try. Ran it for quite some time and it does seem to fix the issue. > diff --git a/mm/swap.c b/mm/swap.c > index 62d96b8..5c58257 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, > void *arg) > { > if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && > - !PageUnevictable(page)) { > + !PageSwapCache(page) && !PageUnevictable(page)) { > bool active = PageActive(page); > > del_page_from_lru_list(page, lruvec, Shouldn't the same check be added to mark_page_lazyfree()? > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 13d711d..be1c98e 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -980,6 +980,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, > int may_enter_fs; > enum page_references references = PAGEREF_RECLAIM_CLEAN; > bool dirty, writeback; > + bool new_added_swapcache = false; > > cond_resched(); > > @@ -1165,6 +1166,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, > > /* Adding to swap updated mapping */ > mapping = page_mapping(page); > + new_added_swapcache = true; > } > } else if (unlikely(PageTransHuge(page))) { > /* Split file THP */ > @@ -1185,6 +1187,10 @@ static unsigned long shrink_page_list(struct list_head *page_list, > nr_unmap_fail++; > goto activate_locked; > } > + /* race with MADV_FREE */ > + if (PageAnon(page) && !PageDirty(page) && > + PageSwapBacked(page) && new_added_swapcache) > + set_page_dirty(page); > } > > if (PageDirty(page)) { -- Regards, Artem -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>