On Tue 26-07-22 20:31:17, Tetsuo Handa wrote: > On 2022/07/26 17:14, Michal Hocko wrote: > > As we have concluded there are two issues possible here which would be > > great to have reflected in the changelog. > > > > On Mon 25-07-22 15:00:32, Andrew Morton wrote: > >> From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > >> Subject: mm: memcontrol: fix potential oom_lock recursion deadlock > >> Date: Fri, 22 Jul 2022 19:45:39 +0900 > >> > >> syzbot is reporting GFP_KERNEL allocation with oom_lock held when > >> reporting memcg OOM [1]. Such allocation request might deadlock the > >> system, for __alloc_pages_may_oom() cannot invoke global OOM killer due to > >> oom_lock being already held by the caller. > > > > I would phrase it like this: > > This report is difficult to explain correctly. > > > syzbot is reporting GFP_KERNEL allocation with oom_lock held when > > reporting memcg OOM [1]. > > Correct. But > > > This is problematic because this creates a > > dependency between GFP_NOFS and GFP_KERNEL over oom_lock which could > > dead lock the system. > > oom_lock is irrelevant when trying GFP_KERNEL allocation from GFP_NOFS > context. Therefore, something like: I meant to say there is a dependency chain potential_fs_lock GFP_NOFS oom_lock GFP_KERNEL potentiaL_lock oom_lock > ---------- > syzbot is reporting GFP_KERNEL allocation with oom_lock held when > reporting memcg OOM [1]. If this allocation triggers the global OOM > situation then the system can livelock because the GFP_KERNEL allocation > with oom_lock held cannot trigger the global OOM killer because > __alloc_pages_may_oom() fails to hold oom_lock. > > Fix this problem by removing the allocation from memory_stat_format() > completely, and pass static buffer when calling from memcg OOM path. > > Note that the caller holding filesystem lock was the trigger for syzbot > to report this locking dependency. Doing GFP_KERNEL allocation with > filesystem lock held can deadlock the system even without involving OOM > situation. > ---------- But this sounds good as well. Thanks! -- Michal Hocko SUSE Labs