On Fri, Mar 15, 2024 at 8:46 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Thu, Mar 14, 2024 at 04:39:21PM +0800, zhaoyang.huang wrote: > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> > > > > Panic[1] reported which is caused by lruvec->list break. Fix the race > > between folio_isolate_lru and release_pages. > > > > race condition: > > release_pages could meet a non-refered folio which escaped from being > > deleted from LRU but add to another list_head > > I don't think the bug is in folio_isolate_lru() but rather in its > caller. > > * Context: > * > * (1) Must be called with an elevated refcount on the folio. This is a > * fundamental difference from isolate_lru_folios() (which is called > * without a stable reference). > > So when release_pages() runs, it must not see a refcount decremented to > zero, because the caller of folio_isolate_lru() is supposed to hold one. > > Your stack trace is for the thread which is calling release_pages(), not > the one calling folio_isolate_lru(), so I can't help you debug further. Thanks for the comments. According to my understanding, folio_put_testzero does the decrement before test which makes it possible to have release_pages see refcnt equal zero and proceed further(folio_get in folio_isolate_lru has not run yet). #0 folio_isolate_lru #1 release_pages BUG_ON(!folio_refcnt) if (folio_put_testzero()) folio_get(folio) if (folio_test_clear_lru())