On Tue 15-09-20 15:09:59, Mateusz Nosek wrote: > > > On 9/14/2020 4:22 PM, Michal Hocko wrote: > > On Mon 14-09-20 12:06:54, mateusznosek0@xxxxxxxxx wrote: > > > From: Mateusz Nosek <mateusznosek0@xxxxxxxxx> > > > > > > Most operations from '__alloc_pages_may_oom' do not require oom_mutex hold. > > > Exception is 'out_of_memory'. The patch refactors '__alloc_pages_may_oom' > > > to reduce critical section size and improve overall system performance. > > > > This is a real slow path. What is the point of optimizing it? Do you > > have any numbers? > > > > I agree that as this is the slow path, then the hard, complicated > optimizations are not recommended. In my humble opinion introduced patch is > not complex and does not decrease code readability or maintainability. In a > nutshell I see no drawbacks of applying it. This is clearly a matter of taste. I do not see a good reason to apply it TBH. It is a claimed optimization without any numbers to back that claim. It is also a tricky area so I am usually very careful to touch this code without a strong reason. Others might feel differently of course. [...] Anyway, I have only now looked closer at the patch... > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index b9bd75cacf02..b07f950a5825 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -3935,18 +3935,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > .order = order, > > > }; > > > struct page *page; > > > - > > > - *did_some_progress = 0; > > > - > > > - /* > > > - * Acquire the oom lock. If that fails, somebody else is > > > - * making progress for us. > > > - */ > > > - if (!mutex_trylock(&oom_lock)) { > > > - *did_some_progress = 1; > > > - schedule_timeout_uninterruptible(1); > > > - return NULL; > > > - } > > > + bool success; > > > /* > > > * Go through the zonelist yet one more time, keep very high watermark > > > @@ -3959,14 +3948,17 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > ~__GFP_DIRECT_RECLAIM, order, > > > ALLOC_WMARK_HIGH|ALLOC_CPUSET, ac); > > > if (page) > > > - goto out; > > > + return page; > > > + > > > + /* Check if somebody else is making progress for us. */ > > > + *did_some_progress = mutex_is_locked(&oom_lock); This is not only quite ugly but wrong as well. In general checking for a lock state is racy unless the lock is taken somewhere up the call chain. In this particular case it wouldn't be a big deal because an additional retry (did_some_progress = 1) is not really critical. It would likely be nicer to be deterministic here and not retry on all the early bailouts regardless of the lock state. > > > /* Coredumps can quickly deplete all memory reserves */ > > > if (current->flags & PF_DUMPCORE) > > > - goto out; > > > + return NULL; > > > /* The OOM killer will not help higher order allocs */ > > > if (order > PAGE_ALLOC_COSTLY_ORDER) > > > - goto out; > > > + return NULL; > > > /* > > > * We have already exhausted all our reclaim opportunities without any > > > * success so it is time to admit defeat. We will skip the OOM killer > > > @@ -3976,12 +3968,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > * The OOM killer may not free memory on a specific node. > > > */ > > > if (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_THISNODE)) > > > - goto out; > > > + return NULL; > > > /* The OOM killer does not needlessly kill tasks for lowmem */ > > > if (ac->highest_zoneidx < ZONE_NORMAL) > > > - goto out; > > > + return NULL; > > > if (pm_suspended_storage()) > > > - goto out; > > > + return NULL; > > > /* > > > * XXX: GFP_NOFS allocations should rather fail than rely on > > > * other request to make a forward progress. > > > @@ -3992,8 +3984,20 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > * failures more gracefully we should just bail out here. > > > */ > > > + /* > > > + * Acquire the oom lock. If that fails, somebody else is > > > + * making progress for us. > > > + */ > > > + if (!mutex_trylock(&oom_lock)) { > > > + *did_some_progress = 1; > > > + schedule_timeout_uninterruptible(1); > > > + return NULL; > > > + } > > > + success = out_of_memory(&oc); > > > + mutex_unlock(&oom_lock); > > > + > > > /* Exhausted what can be done so it's blame time */ > > > - if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > > + if (success || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { > > > *did_some_progress = 1; > > > /* > > > @@ -4004,8 +4008,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > > page = __alloc_pages_cpuset_fallback(gfp_mask, order, > > > ALLOC_NO_WATERMARKS, ac); > > > } > > > -out: > > > - mutex_unlock(&oom_lock); > > > + > > > return page; > > > } > > > -- > > > 2.20.1 > > > > > > Sincerely yours, > Mateusz Nosek -- Michal Hocko SUSE Labs