Re: cgroup v1 and balance_dirty_pages

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 17 Nov 2022 10:12:39 -0500

Hi Aneesh,

On Thu, Nov 17, 2022 at 12:24:13PM +0530, Aneesh Kumar K.V wrote:
> Currently, we don't pause in balance_dirty_pages with cgroup v1 when we
> have task dirtying too many pages w.r.t to memory limit in the memcg.
> This is because with cgroup v1 all the limits are checked against global
> available resources. So on a system with a large amount of memory, a
> cgroup with a smaller limit can easily hit OOM if the task within the
> cgroup continuously dirty pages.

Page reclaim has special writeback throttling for cgroup1, see the
folio_wait_writeback() in shrink_folio_list(). It's not as smooth as
proper dirty throttling, but it should prevent OOMs.

Is this not working anymore?

> Shouldn't we throttle the task based on the memcg limits in this case?
> commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback
> on traditional hierarchies") indicates we run into issues with enabling
> cgroup writeback with v1. But we still can keep the global writeback
> domain, but check the throtling needs against memcg limits in
> balance_dirty_pages()?

Deciding when to throttle is only one side of the coin, though.

The other side is selective flushing in the IO context of whoever
generated the dirty data, and matching the rate of dirtying to the
rate of writeback. This isn't really possible in cgroup1, as the
domains for memory and IO control could be disjunct.

For example, if a fast-IO cgroup shares memory with a slow-IO cgroup,
what's the IO context for flushing the shared dirty data? What's the
throttling rate you apply to dirtiers?