Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> writes: > On 2023/05/15 16:38, Mel Gorman wrote: >> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >>> using ZONE_BOOSTED_WATERMARK flag. >>> >>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >>> risk of deadlock. >> >> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd >> to try wake itself as the waitqueue will be active when wakeup_kswapd() >> is called so no wakeup occurs. If there is a deadlock, it needs a better >> explanation. I believe I already stated why this patch is fixing a bug >> but it wasn't deadlock related. >> > > I noticed this problem ( pgdat->kswapd_wait might be held without > __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code > is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at > https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . > > I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. This has confused me much before. IIUC, the deadlock is unrelated with kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is, pgdat->kswapd_wait.lock holders take the pi lock pi lock holders take the rq lock rq lock holders take the timer base lock timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check) The above is based on analysis in https://lore.kernel.org/all/20190107204627.GA25526@xxxxxxxxxxx/ https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@xxxxxxxxxxxxxxxxxxx/ Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the circular dependency chain. > The latter was explained at > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@xxxxxxxxxxxxxxxxxxx/ > and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at > https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . > > This patch is for making sure that debugobject code will not try to hold > pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. > > Thus, the problem this patch will fix is a deadlock related, isn't it? Best Regards, Huang, Ying