在 2020/4/21 上午6:11, Johannes Weiner 写道: > This patch series reworks memcg to charge swapin pages directly at > swapin time, rather than at fault time, which may be much later, or > not happen at all. > > The delayed charging scheme we have right now causes problems: > > - Alex's per-cgroup lru_lock patches rely on pages that have been > isolated from the LRU to have a stable page->mem_cgroup; otherwise > the lock may change underneath him. Swapcache pages are charged only > after they are added to the LRU, and charging doesn't follow the LRU > isolation protocol. Hi Johannes, Thanks a lot! It looks all fine for me. I will rebase per cgroup lru_lock on this. Thanks! Alex > > - Joonsoo's anon workingset patches need a suitable LRU at the time > the page enters the swap cache and displaces the non-resident > info. But the correct LRU is only available after charging. > > - It's a containment hole / DoS vector. Users can trigger arbitrarily > large swap readahead using MADV_WILLNEED. The memory is never > charged unless somebody actually touches it. > > - It complicates the page->mem_cgroup stabilization rules > > In order to charge pages directly at swapin time, the memcg code base > needs to be prepared, and several overdue cleanups become a necessity: > > To charge pages at swapin time, we need to always have cgroup > ownership tracking of swap records. We also cannot rely on > page->mapping to tell apart page types at charge time, because that's > only set up during a page fault. > > To eliminate the page->mapping dependency, memcg needs to ditch its > private page type counters (MEMCG_CACHE, MEMCG_RSS, NR_SHMEM) in favor > of the generic vmstat counters and accounting sites, such as > NR_FILE_PAGES, NR_ANON_MAPPED etc. > > To switch to generic vmstat counters, the charge sequence must be > adjusted such that page->mem_cgroup is set up by the time these > counters are modified. > > The series is structured as follows: > > 1. Bug fixes > 2. Decoupling charging from rmap > 3. Swap controller integration into memcg > 4. Direct swapin charging > > The patches survive a simple swapout->swapin test inside a virtual > machine. Because this is blocking two major patch sets, I'm sending > these out early and will continue testing in parallel to the review. > > include/linux/memcontrol.h | 53 +---- > include/linux/mm.h | 4 +- > include/linux/swap.h | 6 +- > init/Kconfig | 17 +- > kernel/events/uprobes.c | 10 +- > mm/filemap.c | 43 ++--- > mm/huge_memory.c | 45 ++--- > mm/khugepaged.c | 25 +-- > mm/memcontrol.c | 448 ++++++++++++++----------------------------- > mm/memory.c | 51 ++--- > mm/migrate.c | 20 +- > mm/rmap.c | 53 +++-- > mm/shmem.c | 117 +++++------ > mm/swap_cgroup.c | 6 - > mm/swap_state.c | 89 +++++---- > mm/swapfile.c | 25 +-- > mm/userfaultfd.c | 5 +- > 17 files changed, 367 insertions(+), 650 deletions(-) >