Johannes Weiner writes:
That all being said, the semantics of the new 'high' limit in cgroup2 have allowed us to move reclaim/limit enforcement out of the allocation context and into the userspace return path. See the call to mem_cgroup_handle_over_high() from tracehook_notify_resume(), and the comments in try_charge() around set_notify_resume(). This already solves the free->alloc ordering problem by allowing the allocation to exceed the limit temporarily until at least all locks are dropped, we know we can sleep etc., before performing enforcement. That means we may not need the timed sleeps anymore for that purpose, and could bring back directed waits for freeing-events again. What do you think? Any hazards around indefinite sleeps in that resume path? It's called before __rseq_handle_notify_resume and the arch-specific resume callback (which appears to be a no-op currently). Chris, Michal, what are your thoughts? It would certainly be simpler conceptually on the memcg side.
I'm not against that, although I personally don't feel very strongly about it either way, since the current behaviour clearly works in practice.