On Mon 25-06-18 23:31:06, John Hubbard wrote: > On 06/25/2018 08:21 AM, Jan Kara wrote: > > On Thu 21-06-18 18:30:36, Jan Kara wrote: > > So I think the Matthew's idea of removing pinned pages from LRU is > > definitely worth trying to see how complex that would end up being. Did you > > get to looking into it? If not, I can probably find some time to try that > > out. > > > > OK, so I looked into this some more. > > As you implied in an earlier response, removing a page from LRU is > probably the easy part. It's *keeping* it off the LRU that worries me. I > looked at SetPageLRU() uses, there were only 5 call sites, and of those, > I think only one might be difficult: > > __pagevec_lru_add() > > It seems like the way to avoid __pagevec_lru_add() calls on these pages > is to first call lru_add_drain_all, then remove the pages from LRU > (presumably via isolate_page_lru). I think that should do it. But I'm a > little concerned that maybe I'm overlooking something. > > Here are the 5 search hits and my analysis. This may have mistakes in it, > as I'm pretty new to this area, which is why I'm spelling it out: > > 1. mm/memcontrol.c:2082: SetPageLRU(page); > > This is in unlock_page_lru(). Caller: commit_charge(), and it's conditional on > lrucare, so we can just skip it if the new page flag is set. This is only used to move a page from one LRU list to another one - lock_page_lru() removes page from current LRU and unlock_page_lru() places it on the target one. And that all happens under lru_lock so we should not see our pinned pages in this code if we protect ourselves with the lru_lock as well. But probably worth adding VM_BUG_ON()... > 2. mm/swap.c:831: SetPageLRU(page_tail); > This is in lru_add_page_tail(), which is only called by __split_huge_page_tail, and > there, we can also just skip the call for these pages. This is of no concern as this gets called only when splitting huge page. Extra page reference obtained by GUP prevents huge page splitting so this code path cannot be executed for pinned pages. Maybe VM_BUG_ON() for checking page really is not pinned. > 3. mm/swap.c:866: SetPageLRU(page); > This is in __pagevec_lru_add_fn (sole caller: __pagevec_lru_add), and is > discussed above. Agreed that here we'll need to update __pagevec_lru_add_fn() to detect pinned pages and avoid putting them to LRU. > 4. mm/vmscan.c:1680: SetPageLRU(page); > This is in putback_inactive_pages(), which I think won't get called unless > the page is already on an LRU. > > 5. mm/vmscan.c:1873: SetPageLRU(page); // (N/A) > This is in move_active_pages_to_lru(), which I also think won't get called unless > the page is already on an LRU. These two are correct. We just have to be careful about the cases where page pinning races with reclaim handling these pages. But when we transition the page to the 'pinned' state under lru_lock and remove it from any list it is on, we should be safe against all the races with reclaim. We just have to be careful to get all the accounting right when moving page to the pinned state. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR