On 2022/07/26 17:14, Michal Hocko wrote: > As we have concluded there are two issues possible here which would be > great to have reflected in the changelog. > > On Mon 25-07-22 15:00:32, Andrew Morton wrote: >> From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> >> Subject: mm: memcontrol: fix potential oom_lock recursion deadlock >> Date: Fri, 22 Jul 2022 19:45:39 +0900 >> >> syzbot is reporting GFP_KERNEL allocation with oom_lock held when >> reporting memcg OOM [1]. Such allocation request might deadlock the >> system, for __alloc_pages_may_oom() cannot invoke global OOM killer due to >> oom_lock being already held by the caller. > > I would phrase it like this: This report is difficult to explain correctly. > syzbot is reporting GFP_KERNEL allocation with oom_lock held when > reporting memcg OOM [1]. Correct. But > This is problematic because this creates a > dependency between GFP_NOFS and GFP_KERNEL over oom_lock which could > dead lock the system. oom_lock is irrelevant when trying GFP_KERNEL allocation from GFP_NOFS context. Therefore, something like: ---------- syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. Fix this problem by removing the allocation from memory_stat_format() completely, and pass static buffer when calling from memcg OOM path. Note that the caller holding filesystem lock was the trigger for syzbot to report this locking dependency. Doing GFP_KERNEL allocation with filesystem lock held can deadlock the system even without involving OOM situation. ----------