On Fri, Mar 12, 2021 at 3:07 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Fri, Mar 12, 2021 at 02:42:45PM -0800, Shakeel Butt wrote: > > Hi Johannes, > > > > On Fri, Mar 12, 2021 at 11:23 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > > [...] > > > > > > Longer term we most likely need it there anyway. The issue you are > > > describing in the cover letter - allocations pinning memcgs for a long > > > time - it exists at a larger scale and is causing recurring problems > > > in the real world: page cache doesn't get reclaimed for a long time, > > > or is used by the second, third, fourth, ... instance of the same job > > > that was restarted into a new cgroup every time. Unreclaimable dying > > > cgroups pile up, waste memory, and make page reclaim very inefficient. > > > > > > > For the scenario described above, do we really want to reparent the > > page cache pages? Shouldn't we recharge the pages to the second, > > third, fourth and so on, memcgs? My concern is that we will see a big > > chunk of page cache pages charged to root and will only get reclaimed > > on global pressure. > > Sorry, I'm proposing to reparent to the ancestor, not root. It's an > optimization, not a change in user-visible behavior. > > As far as the user can tell, the pages already belong to the parent > after deletion: they'll show up in the parent's stats, naturally, and > they will get reclaimed as part of the parent being reclaimed. > > The dead cgroup doesn't even have its own limit anymore after > .css_reset() has run. And we already physically reparent slab objects > in memcg_reparent_objcgs() and memcg_drain_all_list_lrus(). > > I'm just saying we should do the same thing for LRU pages. I understand the proposal and I agree it makes total sense when a job is recycling sub-job/sub-container. I was talking about the (recycling of the) top level cgroups. Though for that to be an issue, I suppose the file system has to be shared between the jobs on the system. I was wondering if a page cache reaches the root memcg on multiple reparenting, should the next access cause that page to be charged to the accessor?