Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 24 Aug 2018 09:31:05 +0900

David Rientjes wrote:
> On Fri, 24 Aug 2018, Tetsuo Handa wrote:
> 
> > > For those of us who are tracking CVE-2016-10723 which has peristently been 
> > > labeled as "disputed" and with no clear indication of what patches address 
> > > it, I am assuming that commit 9bfe5ded054b ("mm, oom: remove sleep from 
> > > under oom_lock") and this patch are the intended mitigations?
> > > 
> > > A list of SHA1s for merged fixed and links to proposed patches to address 
> > > this issue would be appreciated.
> > > 
> > 
> > Commit 9bfe5ded054b ("mm, oom: remove sleep from under oom_lock") is a
> > mitigation for CVE-2016-10723.
> > 
> > "[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
> > should_reclaim_retry()." is independent from CVE-2016-10723.
> > 
> 
> Thanks, Tetsuo.  Should commit af5679fbc669 ("mm, oom: remove oom_lock 
> from oom_reaper") also be added to the list for CVE-2016-10723?
> 

Commit af5679fbc669f31f ("mm, oom: remove oom_lock from oom_reaper")
negated one of the two rationales for commit 9bfe5ded054b8e28 ("mm, oom:
remove sleep from under oom_lock"). If we didn't apply af5679fbc669f31f,
we could make sure that CPU resource is given to the owner of oom_lock
by replacing mutex_trylock() in __alloc_pages_may_oom() with mutex_lock().
But now that af5679fbc669f31f was already applied, we don't know how to
give CPU resource to the OOM reaper / exit_mmap(). We might arrive at
direct OOM reaping but we haven't reached there...

For now, I don't think we need to add af5679fbc669f31f to the list for
CVE-2016-10723, for af5679fbc669f31f might cause premature next OOM victim
selection (especially with CONFIG_PREEMPT=y kernels) due to

   __alloc_pages_may_oom():               oom_reap_task():

     mutex_trylock(&oom_lock) succeeds.
     get_page_from_freelist() fails.
     Preempted to other process.
                                            oom_reap_task_mm() succeeds.
                                            Sets MMF_OOM_SKIP.
     Returned from preemption.
     Finds that MMF_OOM_SKIP was already set.
     Selects next OOM victim and kills it.
     mutex_unlock(&oom_lock) is called.

race window like described as

    Tetsuo was arguing that at least MMF_OOM_SKIP should be set under the lock
    to prevent from races when the page allocator didn't manage to get the
    freed (reaped) memory in __alloc_pages_may_oom but it sees the flag later
    on and move on to another victim.  Although this is possible in principle
    let's wait for it to actually happen in real life before we make the
    locking more complex again.

in that commit.