On 17/07/2019 09:20, Juri Lelli wrote: > The following BUG has been reported while running ipsec tests. > > BUG: scheduling while atomic: irq/78-eno3-rx-/12023/0x00000002 > Modules linked in: ipcomp xfrm_ipcomp ... > Preemption disabled at: > [<ffffffffc0b29730>] ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp] > CPU: 1 PID: 12023 Comm: irq/78-eno3-rx- Kdump: loaded Not tainted [...] #1 > Hardware name: [...] > Call Trace: > dump_stack+0x5c/0x80 > ? ipcomp_input+0xd0/0x9a0 [xfrm_ipcomp] > __schedule_bug.cold.81+0x44/0x51 > __schedule+0x5bf/0x6a0 > schedule+0x39/0xd0 > rt_spin_lock_slowlock_locked+0x10e/0x2b0 > rt_spin_lock_slowlock+0x50/0x80 > get_page_from_freelist+0x609/0x1560 > ? zlib_updatewindow+0x5a/0xd0 > __alloc_pages_nodemask+0xd9/0x280 > ipcomp_input+0x299/0x9a0 [xfrm_ipcomp] > xfrm_input+0x5e3/0x960 > xfrm4_ipcomp_rcv+0x34/0x50 > ip_local_deliver_finish+0x22d/0x250 > ip_local_deliver+0x6d/0x110 > ? ip_rcv_finish+0xac/0x480 > ip_rcv+0x28e/0x3f9 > ? packet_rcv+0x43/0x4c0 > __netif_receive_skb_core+0xb7c/0xd10 > ? inet_gro_receive+0x8e/0x2f0 > netif_receive_skb_internal+0x4a/0x160 > napi_gro_receive+0xee/0x110 > tg3_rx+0x2a8/0x810 [tg3] > tg3_poll_work+0x3b3/0x830 [tg3] > tg3_poll_msix+0x3b/0x170 [tg3] > net_rx_action+0x1ff/0x470 > ? __switch_to_asm+0x41/0x70 > do_current_softirqs+0x223/0x3e0 > ? irq_thread_check_affinity+0x20/0x20 > __local_bh_enable+0x51/0x60 > irq_forced_thread_fn+0x5e/0x80 > ? irq_finalize_oneshot.part.45+0xf0/0xf0 > irq_thread+0x13d/0x1a0 > ? wake_threads_waitq+0x30/0x30 > kthread+0x112/0x130 > ? kthread_create_worker_on_cpu+0x70/0x70 > ret_from_fork+0x35/0x40 > > The problem resides in the fact that get_cpu() called from ipcomp_input() > disables preemption, and that triggers the scheduling while atomic BUG further > down the callpath chain of get_page_from_freelist(), i.e., > > ipcomp_input > ipcomp_decompress > <-- get_cpu() > alloc_page(GFP_ATOMIC) > alloc_pages(GFP_ATOMIC, 0) > alloc_pages_current > __alloc_pages_nodemask > get_page_from_freelist > (try_this_zone:) rmqueue > rmqueue_pcplist > __rmqueue_pcplist > rmqueue_bulk > <-- spin_lock(&zone->lock) - BUG > > Fix this by using {get,put}_cpu_light() in ipcomp_decompress(). > > Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx> Reviewed-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> Thanks! -- Daniel