On 2024/7/24 16:15, kernel test robot wrote:
Hello, kernel test robot noticed "BUG:workqueue_lockup-pool" on: commit: e992c326a36a35afe13a4c16094e2a76a90ed5eb ("sbitmap: fix io hung due to race on sbitmap_word::cleared") https://github.com/bvanassche/linux block-for-next
The patch in above branch is different from: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-next&id=72d04bdcf3f7d7e07d82f9757946f68802a7270a return (READ_ONCE(map->word) & word_mask) == word_mask; should be return (READ_ONCE(map->word) & word_mask) != word_mask; Thanks.
in testcase: boot compiler: clang-18 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +---------------------------------------------+------------+------------+ | | b0c61a9e6a | e992c326a3 | +---------------------------------------------+------------+------------+ | BUG:workqueue_lockup-pool | 0 | 10 | +---------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202407241556.b0171c94-lkp@xxxxxxxxx [ 64.765231][ C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 43s! [ 64.766333][ C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 43s! [ 64.767306][ C0] Showing busy workqueues and worker pools: [ 64.767861][ C0] workqueue events: flags=0x0 [ 64.768319][ C0] pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=2 refcnt=3 [ 64.768335][ C0] pending: e1000_watchdog, kfree_rcu_monitor [ 64.768392][ C0] workqueue events_power_efficient: flags=0x80 [ 64.770225][ C0] pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2 [ 64.770228][ C0] pending: do_cache_clean [ 64.770249][ C0] workqueue events_freezable_pwr_efficient: flags=0x84 [ 64.771967][ C0] pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=1 refcnt=2 [ 64.771976][ C0] in-flight: 26:disk_events_workfn [ 64.772005][ C0] workqueue mm_percpu_wq: flags=0x8 [ 64.773657][ C0] pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=1 refcnt=2 [ 64.773660][ C0] pending: vmstat_update [ 64.773697][ C0] workqueue kblockd: flags=0x18 [ 64.775275][ C0] pwq 7: cpus=1 node=0 flags=0x0 nice=-20 active=2 refcnt=3 [ 64.775278][ C0] in-flight: 27:blk_mq_timeout_work [ 64.775293][ C0] pending: blk_mq_timeout_work [ 64.775376][ C0] pool 6: cpus=1 node=0 flags=0x0 nice=0 hung=43s workers=3 idle: 40 1001 [ 64.775391][ C0] pool 7: cpus=1 node=0 flags=0x0 nice=-20 hung=43s workers=2 idle: 859 [ 64.775400][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools: [ 64.779459][ C0] pool 7: [ 64.779465][ C0] task:kworker/1:0H state:R running task stack:0 pid:27 tgid:27 ppid:2 flags:0x00004000 [ 64.779480][ C0] Workqueue: kblockd blk_mq_timeout_work [ 64.779493][ C0] Call Trace: [ 64.779504][ C0] <TASK> [ 64.779541][ C0] __schedule (kernel/sched/core.c:5411) [ 64.779563][ C0] ? __pfx_schedule_timeout (kernel/time/timer.c:2543) [ 64.779571][ C0] schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6823 kernel/sched/core.c:6837) [ 64.779573][ C0] schedule_timeout (kernel/time/timer.c:?) [ 64.779580][ C0] ? get_page_from_freelist (mm/page_alloc.c:3431) [ 64.779588][ C0] __wait_for_common (kernel/sched/completion.c:95 kernel/sched/completion.c:116) [ 64.779591][ C0] ? __pfx_schedule_timeout (kernel/time/timer.c:2543) [ 64.779593][ C0] wait_for_completion_state (kernel/sched/completion.c:266) [ 64.779595][ C0] __wait_rcu_gp (kernel/rcu/update.c:435) [ 64.779607][ C0] synchronize_rcu_normal (kernel/rcu/tree.c:3935) [ 64.779614][ C0] ? __pfx_call_rcu_hurry (include/linux/rcupdate.h:113) [ 64.779617][ C0] ? rcu_blocking_is_gp (include/linux/kernel.h:? kernel/rcu/tree.c:3894) [ 64.779618][ C0] ? synchronize_rcu (kernel/rcu/tree.c:3985) [ 64.779620][ C0] blk_mq_timeout_work (block/blk-mq.c:?) [ 64.779629][ C0] process_scheduled_works (kernel/workqueue.c:3253) [ 64.779647][ C0] worker_thread (include/linux/list.h:373 kernel/workqueue.c:947 kernel/workqueue.c:3410) [ 64.779652][ C0] ? __pfx_worker_thread (kernel/workqueue.c:3356) [ 64.779655][ C0] kthread (kernel/kthread.c:391) [ 64.779668][ C0] ? __pfx_kthread (kernel/kthread.c:342) [ 64.779671][ C0] ret_from_fork (arch/x86/kernel/process.c:153) [ 64.779688][ C0] ? __pfx_kthread (kernel/kthread.c:342) [ 64.779691][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 64.779704][ C0] </TASK> [ 95.485253][ C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 74s! [ 95.486737][ C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 73s! [ 95.487606][ C0] Showing busy workqueues and worker pools: [ 95.488179][ C0] workqueue events: flags=0x0 [ 95.488650][ C0] pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=2 refcnt=3 [ 95.488679][ C0] pending: e1000_watchdog, kfree_rcu_monitor [ 95.488820][ C0] workqueue events_power_efficient: flags=0x80 [ 95.490632][ C0] pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2 [ 95.490635][ C0] pending: do_cache_clean [ 95.490669][ C0] workqueue events_freezable_pwr_efficient: flags=0x84 [ 95.492426][ C0] pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=1 refcnt=2 [ 95.492429][ C0] in-flight: 26:disk_events_workfn [ 95.492527][ C0] workqueue mm_percpu_wq: flags=0x8 [ 95.494193][ C0] pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=1 refcnt=2 [ 95.494196][ C0] pending: vmstat_update [ 95.494265][ C0] workqueue kblockd: flags=0x18 [ 95.495840][ C0] pwq 7: cpus=1 node=0 flags=0x0 nice=-20 active=2 refcnt=3 [ 95.495843][ C0] in-flight: 27:blk_mq_timeout_work [ 95.495858][ C0] pending: blk_mq_timeout_work [ 95.495950][ C0] pool 6: cpus=1 node=0 flags=0x0 nice=0 hung=74s workers=3 idle: 40 1001 [ 95.495977][ C0] pool 7: cpus=1 node=0 flags=0x0 nice=-20 hung=73s workers=2 idle: 859 [ 95.495983][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools: [ 95.500089][ C0] pool 7: [ 95.500106][ C0] task:kworker/1:0H state:R running task stack:0 pid:27 tgid:27 ppid:2 flags:0x00004000 [ 95.500132][ C0] Workqueue: kblockd blk_mq_timeout_work [ 95.500169][ C0] Call Trace: [ 95.500195][ C0] <TASK> [ 95.500259][ C0] __schedule (kernel/sched/core.c:5411) [ 95.500304][ C0] ? __pfx_schedule_timeout (kernel/time/timer.c:2543) [ 95.500320][ C0] schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6823 kernel/sched/core.c:6837) [ 95.500322][ C0] schedule_timeout (kernel/time/timer.c:?) [ 95.500341][ C0] ? get_page_from_freelist (mm/page_alloc.c:3431) [ 95.500363][ C0] __wait_for_common (kernel/sched/completion.c:95 kernel/sched/completion.c:116) [ 95.500365][ C0] ? __pfx_schedule_timeout (kernel/time/timer.c:2543) [ 95.500367][ C0] wait_for_completion_state (kernel/sched/completion.c:266) [ 95.500369][ C0] __wait_rcu_gp (kernel/rcu/update.c:435) [ 95.500399][ C0] synchronize_rcu_normal (kernel/rcu/tree.c:3935) [ 95.500420][ C0] ? __pfx_call_rcu_hurry (include/linux/rcupdate.h:113) [ 95.500432][ C0] ? rcu_blocking_is_gp (include/linux/kernel.h:? kernel/rcu/tree.c:3894) [ 95.500434][ C0] ? synchronize_rcu (kernel/rcu/tree.c:3985) [ 95.500435][ C0] blk_mq_timeout_work (block/blk-mq.c:?) [ 95.500464][ C0] process_scheduled_works (kernel/workqueue.c:3253) [ 95.500516][ C0] worker_thread (include/linux/list.h:373 kernel/workqueue.c:947 kernel/workqueue.c:3410) [ 95.500527][ C0] ? __pfx_worker_thread (kernel/workqueue.c:3356) [ 95.500530][ C0] kthread (kernel/kthread.c:391) [ 95.500585][ C0] ? __pfx_kthread (kernel/kthread.c:342) [ 95.500589][ C0] ret_from_fork (arch/x86/kernel/process.c:153) [ 95.500636][ C0] ? __pfx_kthread (kernel/kthread.c:342) [ 95.500640][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 95.500679][ C0] </TASK> [ 120.705227][ C1] rcu: INFO: rcu_preempt self-detected stall on CPU [ 120.706866][ C1] rcu: 1-....: (25000 ticks this GP) idle=71dc/1/0x4000000000000000 softirq=2935/2935 fqs=12477 [ 120.712272][ C1] rcu: (t=25002 jiffies g=2261 q=805 ncpus=2) [ 120.713520][ C1] CPU: 1 PID: 1601 Comm: (udev-worker) Not tainted 6.10.0-rc6-00303-ge992c326a36a #1 [ 120.715344][ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 120.717321][ C1] RIP: 0010:_raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:152) [ 120.718629][ C1] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 c6 07 00 0f ba e6 09 73 01 fb 65 ff 0d ce bc 10 7e <74> 06 c3 cc cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc cc 0f 1f 00 All code ======== 0: 90 nop 1: 90 nop 2: 90 nop 3: 90 nop 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 90 nop 9: 90 nop a: 90 nop b: 90 nop c: 90 nop d: 90 nop e: 90 nop f: 90 nop 10: f3 0f 1e fa endbr64 14: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 19: c6 07 00 movb $0x0,(%rdi) 1c: 0f ba e6 09 bt $0x9,%esi 20: 73 01 jae 0x23 22: fb sti 23: 65 ff 0d ce bc 10 7e decl %gs:0x7e10bcce(%rip) # 0x7e10bcf8 2a:* 74 06 je 0x32 <-- trapping instruction 2c: c3 ret 2d: cc int3 2e: cc int3 2f: cc int3 30: cc int3 31: cc int3 32: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 37: c3 ret 38: cc int3 39: cc int3 3a: cc int3 3b: cc int3 3c: cc int3 3d: 0f 1f 00 nopl (%rax) Code starting with the faulting instruction =========================================== 0: 74 06 je 0x8 2: c3 ret 3: cc int3 4: cc int3 5: cc int3 6: cc int3 7: cc int3 8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) d: c3 ret e: cc int3 f: cc int3 10: cc int3 11: cc int3 12: cc int3 13: 0f 1f 00 nopl (%rax) [ 120.722091][ C1] RSP: 0018:ffffc9000027fa60 EFLAGS: 00000247 [ 120.723180][ C1] RAX: 0000000000000286 RBX: ffff8881335dc4c8 RCX: 0000000000000000 [ 120.724696][ C1] RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff8881335dc4c8 [ 120.726162][ C1] RBP: ffff8881335dc480 R08: 0000000000000001 R09: ffffffffffffffff [ 120.727782][ C1] R10: 0000000000000000 R11: ffffffff817cf120 R12: 0000000000000000 [ 120.729356][ C1] R13: 0000000000000001 R14: 0000000000000000 R15: fffffffffffffffe [ 120.730936][ C1] FS: 00007f213215b8c0(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 120.732680][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 120.733960][ C1] CR2: 000055c76ff83708 CR3: 00000001482fe000 CR4: 00000000000406f0 [ 120.735559][ C1] Call Trace: [ 120.736328][ C1] <IRQ> [ 120.737025][ C1] ? rcu_dump_cpu_stacks (include/linux/cpumask.h:231 kernel/rcu/tree_stall.h:374) [ 120.738036][ C1] ? print_cpu_stall (kernel/rcu/tree_stall.h:702) [ 120.739012][ C1] ? rcu_sched_clock_irq (kernel/rcu/tree_stall.h:?) [ 120.740040][ C1] ? update_process_times (arch/x86/include/asm/preempt.h:26 kernel/time/timer.c:2487) [ 120.741048][ C1] ? tick_nohz_handler (kernel/time/tick-sched.c:187 kernel/time/tick-sched.c:306) [ 120.742044][ C1] ? __pfx_tick_nohz_handler (kernel/time/tick-sched.c:285) [ 120.743092][ C1] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1689) [ 120.744101][ C1] ? hrtimer_interrupt (kernel/time/hrtimer.c:1818) [ 120.745084][ C1 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240724/202407241556.b0171c94-lkp@xxxxxxxxx