On 2023/05/12 12:44, Andrew Morton wrote: > On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > >> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is >> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd. >> Since fill_pool() might be called with arbitrary locks held, >> fill_pool() should not assume that holding pgdat->kswapd_wait is safe. > > hm. But many GFP_ATOMIC allocation attempts are made with locks held. > Why aren't all such callers buggy, by trying to wake kswapd with locks > held? What's special about this one? Because debugobject cannot know what locks are held when fill_pool() does GFP_ATOMIC allocation. syzbot is reporting base->lock => pgdat->kswapd_wait dependency add_timer() { __mod_timer() { base = lock_timer_base(timer, &flags); new_base = get_target_base(base, timer->flags); if (base != new_base) { raw_spin_unlock(&base->lock); base = new_base; raw_spin_lock(&base->lock); } debug_timer_activate(timer) { debug_object_activate(timer, &timer_debug_descr) { debug_objects_fill_pool() { fill_pool() { kmem_cache_zalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN) { // wakes kswapd } } } } } raw_spin_unlock_irqrestore(&base->lock, flags); } } when pgdat->kswapd_wait => p->pi_lock dependency __alloc_pages() { get_page_from_freelist() { rmqueue() { wakeup_kswapd() { wake_up_interruptible(&pgdat->kswapd_wait) { __wake_up_common_lock() { spin_lock_irqsave(&pgdat->kswapd_wait.lock, flags); __wake_up_common() { autoremove_wake_function() { try_to_wake_up() { raw_spin_lock_irqsave(&p->pi_lock, flags); raw_spin_unlock_irqrestore(&p->pi_lock, flags); } } } spin_unlock_irqrestore(&pgdat->kswapd_wait.lock, flags); } } } } } } and p->pi_lock => rq->__lock => base->lock dependency wake_up_new_task() { raw_spin_lock_irqsave(&p->pi_lock, rf.flags); rq = __task_rq_lock(p, &rf); // acquires rq->lock activate_task(rq, p, ENQUEUE_NOCLOCK) { enqueue_task() { psi_enqueue() { psi_task_change() { queue_delayed_work_on() { __queue_delayed_work() { add_timer() { __mod_timer() { base = lock_timer_base(timer, &flags); // acquires base->lock debug_timer_activate(timer); // possible base->lock => pgdat->kswapd_wait => p->pi_lock dependency raw_spin_unlock_irqrestore(&base->lock, flags); } } } } } } } } task_rq_unlock(rq, p, &rf); } exists. All GFP_ATOMIC allocation users are supposed to be aware of what locks are held, and are supposed to explicitly remove __GFP_KSWAPD_RECLAIM if waking up kswapd can cause deadlock. But reality is that we can't be careful enough to error-free. Who would imagine GFP_ATOMIC allocation while base->lock is held can form circular locking dependency? > >> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation __GFP_NORETRY is not checked by !__GFP_DIRECT_RECLAIM allocation. GFP_ATOMIC - __GFP_KSWAPD_RECLAIM is __GFP_HIGH. >> >> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = { >> >> static void fill_pool(void) >> { >> - gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN; >> + gfp_t gfp = __GFP_HIGH | __GFP_NOWARN; > > Does this weaken fill_pool()'s allocation attempt more than necessary? > We can still pass __GFP_HIGH? What do you mean? I think that killing base->lock => pgdat->kswapd_wait by removing __GFP_KSWAPD_RECLAIM is the right fix. This weakening is needed for avoiding base->lock => pgdat->kswapd_wait dependency from debugobject code. For locking dependency safety, I wish that GFP_ATOMIC / GFP_NOWAIT do not imply __GFP_KSWAPD_RECLAIM. Such allocations should not try to allocate as many pages as even __GFP_HIGH fails. And if such allocations try to allocate as many pages as even __GFP_HIGH fails, they likely already failed before background kswapd reclaim finds some reusable pages....