On 2023/05/16 10:44, Huang, Ying wrote: > Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> writes: > >> On 2023/05/15 16:38, Mel Gorman wrote: >>> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >>>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >>>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >>>> using ZONE_BOOSTED_WATERMARK flag. >>>> >>>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >>>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >>>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >>>> risk of deadlock. >>> >>> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd >>> to try wake itself as the waitqueue will be active when wakeup_kswapd() >>> is called so no wakeup occurs. If there is a deadlock, it needs a better >>> explanation. I believe I already stated why this patch is fixing a bug >>> but it wasn't deadlock related. >>> >> >> I noticed this problem ( pgdat->kswapd_wait might be held without >> __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code >> is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at >> https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . >> >> I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. > > This has confused me much before. IIUC, the deadlock is unrelated with > kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock > in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is, > > pgdat->kswapd_wait.lock holders take the pi lock > pi lock holders take the rq lock > rq lock holders take the timer base lock > timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check) Yes, thank you for explanation. > > The above is based on analysis in > > https://lore.kernel.org/all/20190107204627.GA25526@xxxxxxxxxxx/ > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@xxxxxxxxxxxxxxxxxxx/ > > Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base > lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the > circular dependency chain. Yes. Mel, any questions on this patch? Thomas Gleixner described this lock as kswapd_wait::lock at https://lkml.kernel.org/r/168476016890.404.6911447269153588182.tip-bot2@tip-bot2 . Should I resubmit this patch with s/pgdat->kswapd_wait/pgdat->kswapd_wait.lock/ or s/pgdat->kswapd_wait/kswapd_wait::lock/ ? > >> The latter was explained at >> https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@xxxxxxxxxxxxxxxxxxx/ >> and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at >> https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . >> >> This patch is for making sure that debugobject code will not try to hold >> pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. >> >> Thus, the problem this patch will fix is a deadlock related, isn't it? > > Best Regards, > Huang, Ying