Re: [PATCH] memcg: calling reclaim_high(GFP_KERNEL) in GFP_NOFS context deadlocks

Michal Koutný <mkoutny@xxxxxxxx> · Mon, 3 Oct 2022 17:08:36 +0200

On Sat, Oct 01, 2022 at 08:08:34AM +1000, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> You might be right in that c9afe31ec443 exposed the issue, but it's
> not the root cause. I think c9afe31ec443 just a case of a
> new caller of mem_cgroup_handle_over_high() stepping on the landmine
> left by b3ff92916af3 adding an unconditional GFP_KERNEL direct
> reclaim deep in the guts of the memcg code.

It's specific of the memory.high induced reclaim that it happens out of
sensitive paths (as was with exit to usermode or workqueue), so there'd
be no explicit flags to pass through, hence the unconditional
GFP_KERNEL.

> So what's the real root cause of the issue - the commit that stepped
> on the landmine, or the commit that placed the landmine?

My preference here is slighty on the newer commit but feel free to
reference both.

> Either way, if anyone backports b3ff92916af3 or has a kernel with
> b3ff92916af3 and not c9afe31ec443, they still need to know
> about the landmine in b3ff92916af3....

To be on the same page -- having just b3ff92916af3 won't lead to the
described cycle when FS code reclaims without GFP_NOFS? (IOW, how would
the fix look like fix without c9afe31ec443?)

Thanks,
Michal
Attachment:
signature.asc

Description: Digital signature