Hi, Shakeel, On 09.01.2019 20:37, Shakeel Butt wrote: > Hi Kirill, > > On Wed, Jan 9, 2019 at 4:20 AM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: >> >> On nodes without memory overcommit, it's common a situation, >> when memcg exceeds its limit and pages from pagecache are >> shrinked on reclaim, while node has a lot of free memory. >> Further access to the pages requires real device IO, while >> IO causes time delays, worse powerusage, worse throughput >> for other users of the device, etc. >> >> Cleancache is not a good solution for this problem, since >> it implies copying of page on every cleancache_put_page() >> and cleancache_get_page(). Also, it requires introduction >> of internal per-cleancache_ops data structures to manage >> cached pages and their inodes relationships, which again >> introduces overhead. >> >> This patchset introduces another solution. It introduces >> a new scheme for evicting memcg pages: >> >> 1)__remove_mapping() uncharges unmapped page memcg >> and leaves page in pagecache on memcg reclaim; >> >> 2)putback_lru_page() places page into root_mem_cgroup >> list, since its memcg is NULL. Page may be evicted >> on global reclaim (and this will be easily, as >> page is not mapped, so shrinker will shrink it >> with 100% probability of success); >> >> 3)pagecache_get_page() charges page into memcg of >> a task, which takes it first. >> > > From what I understand from the proposal, on memcg reclaim, the file > pages are uncharged but kept in the memory and if they are accessed > again (either through mmap or syscall), they will be charged again but > to the requesting memcg. Also it is assumed that the global reclaim of > such uncharged file pages is very fast and deterministic. Is that > right? Yes, this was my assumption. But Michal, Josef and Johannes pointed a diving into reclaim in general is not fast. So, maybe we need some more creativity here to minimize the effect of this diving.. Thanks, Kirill