On Fri, Feb 03, 2023 at 07:17:14AM +0800, Huang, Ying wrote: > "Huang, Ying" <ying.huang@xxxxxxxxx> writes: > > > Hi, Hyeonggon, > > > > Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> writes: > > > >> On Wed, Feb 01, 2023 at 01:09:10AM +0900, Hyeonggon Yoo wrote: > >>> I've observed random list_del corruption on mm-unstable, > >>> where HEAD is commit d69862e693c069f4 > >>> ("mm/migrate: convert putback_movable_pages() to use folios"). > >>> > >>> The issue can be easily reproduced by stressing MM multiple times: > >>> stress-ng --bigheap 0 --timeout 300 > >>> > >>> The compiler is gcc 12.2.1 and config, dmesg are included as attachment. > >>> I will try to bisect but can't promise quick resolution :) > >> > >> > >> The first bad commits appears to be: > >> c203c6d5b3f0597 ("migrate_pages: batch _unmap and _move") > >> > >> the first bad commit _probably_ be earlier, but this is quite > >> easy to reproduce so at this point I think above is the real bad commit. > > > > Thank you very much for reporting the bug. I'm in travel now but I will > > try to find some time to reproduce and debug it. > > Still haven't reproduced the issue. But after reviewing the code, I > found a bug in the code, which may cause list corruption. Can you try > the debug patch below? Unfortunately my home server seems to be broken again :( That means I only have access to VMs and not a real machine now. FYI it was not reproduced on KVM but reproduced on real machine. Could you try checking on your machine with the config I attached? [1] Sorry to bother your travel! [1] https://marc.info/?l=linux-mm&m=167518135116956 Thanks, Hyeonggon > Best Regards, > Huang, Ying > > -------------------------------8<------------------------------------- > From a4eef847fe4f6e50b6c3f69651c1dfdeb4b23bc4 Mon Sep 17 00:00:00 2001 > From: Huang Ying <ying.huang@xxxxxxxxx> > Date: Fri, 3 Feb 2023 07:12:24 +0800 > Subject: [PATCH] dbg: fix list corruption for -EAGAIN > > --- > mm/migrate.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 143d96775b4d..4205a0297ef8 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1230,11 +1230,11 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst, > > rc = move_to_new_folio(dst, src, mode); > > - if (rc != -EAGAIN) > + if (rc != -EAGAIN) { > list_del(&dst->lru); > - > - if (unlikely(!is_lru)) > - goto out_unlock_both; > + if (unlikely(!is_lru)) > + goto out_unlock_both; > + } > > /* > * When successful, push dst to LRU immediately: so that if it > -- > 2.35.1 >