Re: [PATCH] mm: fix a race scenario in folio_isolate_lru

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 16, 2024 at 04:53:09PM +0800, Zhaoyang Huang wrote:
> On Fri, Mar 15, 2024 at 8:46 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Mar 14, 2024 at 04:39:21PM +0800, zhaoyang.huang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > >
> > > Panic[1] reported which is caused by lruvec->list break. Fix the race
> > > between folio_isolate_lru and release_pages.
> > >
> > > race condition:
> > > release_pages could meet a non-refered folio which escaped from being
> > > deleted from LRU but add to another list_head
> >
> > I don't think the bug is in folio_isolate_lru() but rather in its
> > caller.
> >
> >  * Context:
> >  *
> >  * (1) Must be called with an elevated refcount on the folio. This is a
> >  *     fundamental difference from isolate_lru_folios() (which is called
> >  *     without a stable reference).
> >
> > So when release_pages() runs, it must not see a refcount decremented to
> > zero, because the caller of folio_isolate_lru() is supposed to hold one.
> >
> > Your stack trace is for the thread which is calling release_pages(), not
> > the one calling folio_isolate_lru(), so I can't help you debug further.
> Thanks for the comments.  According to my understanding,
> folio_put_testzero does the decrement before test which makes it
> possible to have release_pages see refcnt equal zero and proceed
> further(folio_get in folio_isolate_lru has not run yet).

No, that's not possible.

In the scenario below, at entry to folio_isolate_lru(), the folio has
refcount 2.  It has one refcount from thread 0 (because it must own one
before calling folio_isolate_lru()) and it has one refcount from thread 1
(because it's about to call release_pages()).  If release_pages() were
not running, the folio would have refcount 3 when folio_isolate_lru()
returned.

>    #0 folio_isolate_lru          #1 release_pages
> BUG_ON(!folio_refcnt)
>                                          if (folio_put_testzero())
>    folio_get(folio)
>    if (folio_test_clear_lru())




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux