On Sat 09-03-19 15:02:22, Tetsuo Handa wrote: > On 2019/03/08 20:58, Michal Hocko wrote: > > OK, that makes sense to me. I cannot judge the implementation because I > > am not really familiar with lockdep machinery. Could you explain how it > > doesn't trigger for all other allocations? > > This is same with why fs_reclaim_acquire()/fs_reclaim_release() doesn't trigger > for all other allocations. Any allocation request which might involve __GFP_FS > reclaim passes "struct lockdep_map __fs_reclaim_map", and lockdep records it. Yes, exactly. NOFS handling made me go and ask. It uses some nasty hacks on a dedicated lockdep_map and it was not really clear to me whether reusing oom_lock's one is always safe. > > Also why it is not sufficient to add the lockdep annotation prior to the > > trylock in __alloc_pages_may_oom? > > This is same with why fs_reclaim_acquire()/fs_reclaim_release() is called from > prepare_alloc_pages(). If an allocation request which might involve __GFP_FS > __perform_reclaim() succeeded before actually calling __perform_reclaim(), we > fail to pass "struct lockdep_map __fs_reclaim_map" (which makes it difficult to > check whether there is possibility of deadlock). Likewise, if an allocation > request which might call __alloc_pages_may_oom() succeeded before actually > calling __alloc_pages_may_oom(), we fail to pass oom_lock.lockdep_map (which > makes it difficult to check whether there is possibility of deadlock). OK, makes sense to me. > Strictly speaking, there is > > if (tsk_is_oom_victim(current) && > (alloc_flags == ALLOC_OOM || > (gfp_mask & __GFP_NOMEMALLOC))) > goto nopage; > > case where failing to hold oom_lock at __alloc_pages_may_oom() does not > cause a problem. But I think that we should not check tsk_is_oom_victim() > at prepare_alloc_pages(). Yes, I would just preffer the check to be as simple as possible. There shouldn't be any real reason to put all those conditions in and it would increase the maintenance burden because anytime we reconsider those oom rules this would have to be in sync. If the allocation is sleepable then we should warn because such an allocation is dubious at best. > > It would be also great to pull it out of the code flow and hide it > > behind a helper static inline. Something like > > lockdep_track_oom_alloc_reentrant or a like. > > OK. Here is v2 patch. > > >From ec8d0accf15b4566c065ca8c63a4e1185f0a0c78 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Date: Sat, 9 Mar 2019 09:55:08 +0900 > Subject: [PATCH v2] mm,oom: Teach lockdep about oom_lock. > > Since a thread which succeeded to hold oom_lock must not involve blocking > memory allocations, teach lockdep to consider that blocking memory > allocations might wait for oom_lock at as early location as possible, and > teach lockdep to consider that oom_lock is held by mutex_lock() than by > mutex_trylock(). This is still really hard to understand. Especially the last part of the sentence. The lockdep will know that the lock is held even when going via trylock. I guess you meant to say that mutex_lock(oom_lock) allocation mutex_trylock(oom_lock) is not caught by the lockdep, right? > Also, since the OOM killer is disabled until the OOM reaper or exit_mmap() > sets MMF_OOM_SKIP, teach lockdep to consider that oom_lock is held when > __oom_reap_task_mm() is called. It would be good to mention that the oom reaper acts as a guarantee of a forward progress and as such it cannot depend on any memory allocation and that is why this context is marked. This would be easier to understand IMHO. > This patch should not cause lockdep splats unless there is somebody doing > dangerous things (e.g. from OOM notifiers, from the OOM reaper). > > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > --- > include/linux/oom.h | 16 ++++++++++++++++ > mm/oom_kill.c | 9 ++++++++- > mm/page_alloc.c | 5 +++++ > 3 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/include/linux/oom.h b/include/linux/oom.h > index d079920..8544c23 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -56,6 +56,22 @@ struct oom_control { > > extern struct mutex oom_lock; > > +static inline void oom_reclaim_acquire(gfp_t gfp_mask, unsigned int order) > +{ > + if ((gfp_mask & __GFP_DIRECT_RECLAIM) && !(gfp_mask & __GFP_NORETRY) && > + (!(gfp_mask & __GFP_RETRY_MAYFAIL) || > + order <= PAGE_ALLOC_COSTLY_ORDER)) > + mutex_acquire(&oom_lock.dep_map, 0, 0, _THIS_IP_); > +} that being said why not only make this if (gfp_mask & __GFP_DIRECT_RECLAIM) mutex_acquire(....) -- Michal Hocko SUSE Labs