Hi,
在 2024/08/08 5:05, John Stoffel 写道:
"Christian" == Christian Theune <ct@xxxxxxxxxxxxxxx> writes:
i had some more time at hand and managent to compile 5.15.164. The
issue is the same. After around 1h30m of work it hangs. I’ll try to
reproduce this on a newer supported kernel if I can.
Supported by who? NixOS? Why don't you just install linux kernel
6.6.x and see of the problem is still there? 5.15.x is ancient and
un-supported upstream now.
I agree that test on 5.15.x is not necessary for now. Please test on
v6.10 directly and make sure this is not a known problem first. The
hang stack is not really helpful because lots of problems will look like
this. :(
Thanks,
Kuai
Kernel:
Linux version 5.15.164 (nixbld@localhost) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.40) #1-NixOS SMP Sat Jul 27 08:46:18 UTC 2024
The config is unchanged except from the deprecated NFSD_V2_ACL and NFSD_V3 options which I had to remove. NFS is not in use on this server, though.
Output:
[ 4549.838672] INFO: task kworker/u64:7:432 blocked for more than 122 seconds.
[ 4549.846507] Not tainted 5.15.164 #1-NixOS
[ 4549.851616] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4549.860421] task:kworker/u64:7 state:D stack: 0 pid: 432 ppid: 2 flags:0x00004000
[ 4549.860426] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4549.860435] Call Trace:
[ 4549.860437] <TASK>
[ 4549.860440] __schedule+0x373/0x1580
[ 4549.860446] ? sysvec_call_function_single+0xa/0x90
[ 4549.860449] ? asm_sysvec_call_function_single+0x16/0x20
[ 4549.860453] schedule+0x5b/0xe0
[ 4549.860455] md_bitmap_startwrite+0x177/0x1e0
[ 4549.860459] ? finish_wait+0x90/0x90
[ 4549.860465] add_stripe_bio+0x449/0x770 [raid456]
[ 4549.860472] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4549.860476] ? kmem_cache_alloc_node_trace+0x341/0x3e0
[ 4549.860480] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.860484] ? linear_map+0x44/0x90 [dm_mod]
[ 4549.860490] ? finish_wait+0x90/0x90
[ 4549.860492] ? __blk_queue_split+0x516/0x580
[ 4549.860495] md_handle_request+0x11f/0x1b0
[ 4549.860500] md_submit_bio+0x6e/0xb0
[ 4549.860502] __submit_bio+0x18c/0x220
[ 4549.860505] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.860507] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4549.860510] submit_bio_noacct+0xbe/0x2d0
[ 4549.860512] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4549.860517] process_one_work+0x1d3/0x360
[ 4549.860521] worker_thread+0x4d/0x3b0
[ 4549.860523] ? process_one_work+0x360/0x360
[ 4549.860525] kthread+0x115/0x140
[ 4549.860528] ? set_kthread_struct+0x50/0x50
[ 4549.860530] ret_from_fork+0x1f/0x30
[ 4549.860535] </TASK>
[ 4549.860536] INFO: task kworker/u64:23:448 blocked for more than 122 seconds.
[ 4549.868461] Not tainted 5.15.164 #1-NixOS
[ 4549.873555] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4549.882358] task:kworker/u64:23 state:D stack: 0 pid: 448 ppid: 2 flags:0x00004000
[ 4549.882364] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4549.882368] Call Trace:
[ 4549.882369] <TASK>
[ 4549.882370] __schedule+0x373/0x1580
[ 4549.882373] ? sysvec_apic_timer_interrupt+0xa/0x90
[ 4549.882375] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 4549.882379] schedule+0x5b/0xe0
[ 4549.882382] md_bitmap_startwrite+0x177/0x1e0
[ 4549.882384] ? finish_wait+0x90/0x90
[ 4549.882387] add_stripe_bio+0x449/0x770 [raid456]
[ 4549.882393] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4549.882397] ? __bio_clone_fast+0xa5/0xe0
[ 4549.882401] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.882403] ? finish_wait+0x90/0x90
[ 4549.882406] md_handle_request+0x11f/0x1b0
[ 4549.882410] ? blk_throtl_charge_bio_split+0x23/0x60
[ 4549.882413] md_submit_bio+0x6e/0xb0
[ 4549.882415] __submit_bio+0x18c/0x220
[ 4549.882417] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.882419] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4549.882421] submit_bio_noacct+0xbe/0x2d0
[ 4549.882424] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4549.882428] process_one_work+0x1d3/0x360
[ 4549.882431] worker_thread+0x4d/0x3b0
[ 4549.882433] ? process_one_work+0x360/0x360
[ 4549.882435] kthread+0x115/0x140
[ 4549.882436] ? set_kthread_struct+0x50/0x50
[ 4549.882438] ret_from_fork+0x1f/0x30
[ 4549.882442] </TASK>
[ 4549.882497] INFO: task .backy-wrapped:2578 blocked for more than 122 seconds.
[ 4549.890517] Not tainted 5.15.164 #1-NixOS
[ 4549.895611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4549.904406] task:.backy-wrapped state:D stack: 0 pid: 2578 ppid: 1 flags:0x00000002
[ 4549.904411] Call Trace:
[ 4549.904412] <TASK>
[ 4549.904414] __schedule+0x373/0x1580
[ 4549.904419] ? xlog_cil_commit+0x556/0x880 [xfs]
[ 4549.904465] ? __xfs_trans_commit+0xac/0x2f0 [xfs]
[ 4549.904498] schedule+0x5b/0xe0
[ 4549.904500] io_schedule+0x42/0x70
[ 4549.904503] wait_on_page_bit_common+0x119/0x380
[ 4549.904507] ? __page_cache_alloc+0x80/0x80
[ 4549.904510] wait_on_page_writeback+0x22/0x70
[ 4549.904513] truncate_inode_pages_range+0x26f/0x6d0
[ 4549.904520] evict+0x15f/0x180
[ 4549.904524] __dentry_kill+0xde/0x170
[ 4549.904527] dput+0x139/0x320
[ 4549.904529] do_renameat2+0x375/0x5f0
[ 4549.904536] __x64_sys_rename+0x3f/0x50
[ 4549.904538] do_syscall_64+0x34/0x80
[ 4549.904541] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 4549.904544] RIP: 0033:0x7fbf3e61a75b
[ 4549.904545] RSP: 002b:00007ffc61e25988 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[ 4549.904548] RAX: ffffffffffffffda RBX: 00007ffc61e25a20 RCX: 00007fbf3e61a75b
[ 4549.904549] RDX: 0000000000000000 RSI: 00007fbf2f7ff150 RDI: 00007fbf2f7fc190
[ 4549.904550] RBP: 00007ffc61e259d0 R08: 00000000ffffffff R09: 0000000000000000
[ 4549.904551] R10: 00007ffc61e25c00 R11: 0000000000000246 R12: 00000000ffffff9c
[ 4549.904552] R13: 00000000ffffff9c R14: 00000000016afab0 R15: 00007fbf30ef0810
[ 4549.904555] </TASK>
[ 4549.904556] INFO: task kworker/u64:0:4372 blocked for more than 122 seconds.
[ 4549.912477] Not tainted 5.15.164 #1-NixOS
[ 4549.917573] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4549.926373] task:kworker/u64:0 state:D stack: 0 pid: 4372 ppid: 2 flags:0x00004000
[ 4549.926376] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4549.926380] Call Trace:
[ 4549.926381] <TASK>
[ 4549.926383] __schedule+0x373/0x1580
[ 4549.926386] ? sysvec_apic_timer_interrupt+0xa/0x90
[ 4549.926389] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 4549.926392] schedule+0x5b/0xe0
[ 4549.926394] md_bitmap_startwrite+0x177/0x1e0
[ 4549.926397] ? finish_wait+0x90/0x90
[ 4549.926401] add_stripe_bio+0x449/0x770 [raid456]
[ 4549.926406] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4549.926410] ? __bio_clone_fast+0xa5/0xe0
[ 4549.926413] ? finish_wait+0x90/0x90
[ 4549.926415] ? __blk_queue_split+0x2d0/0x580
[ 4549.926418] md_handle_request+0x11f/0x1b0
[ 4549.926422] md_submit_bio+0x6e/0xb0
[ 4549.926424] __submit_bio+0x18c/0x220
[ 4549.926426] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.926428] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4549.926431] submit_bio_noacct+0xbe/0x2d0
[ 4549.926434] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4549.926437] process_one_work+0x1d3/0x360
[ 4549.926441] worker_thread+0x4d/0x3b0
[ 4549.926442] ? process_one_work+0x360/0x360
[ 4549.926444] kthread+0x115/0x140
[ 4549.926447] ? set_kthread_struct+0x50/0x50
[ 4549.926448] ret_from_fork+0x1f/0x30
[ 4549.926454] </TASK>
[ 4549.926459] INFO: task rsync:4929 blocked for more than 122 seconds.
[ 4549.933603] Not tainted 5.15.164 #1-NixOS
[ 4549.938702] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4549.947501] task:rsync state:D stack: 0 pid: 4929 ppid: 4925 flags:0x00000000
[ 4549.947503] Call Trace:
[ 4549.947505] <TASK>
[ 4549.947505] ? usleep_range_state+0x90/0x90
[ 4549.947510] __schedule+0x373/0x1580
[ 4549.947513] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.947515] ? blk_mq_sched_insert_requests+0x97/0xe0
[ 4549.947519] ? usleep_range_state+0x90/0x90
[ 4549.947521] schedule+0x5b/0xe0
[ 4549.947523] schedule_timeout+0xff/0x130
[ 4549.947526] __wait_for_common+0xaf/0x160
[ 4549.947530] xfs_buf_iowait+0x1c/0xa0 [xfs]
[ 4549.947573] __xfs_buf_submit+0x109/0x1b0 [xfs]
[ 4549.947604] xfs_buf_read_map+0x120/0x280 [xfs]
[ 4549.947635] ? xfs_btree_read_buf_block.constprop.0+0xae/0xf0 [xfs]
[ 4549.947670] xfs_trans_read_buf_map+0x156/0x2c0 [xfs]
[ 4549.947705] ? xfs_btree_read_buf_block.constprop.0+0xae/0xf0 [xfs]
[ 4549.947735] xfs_btree_read_buf_block.constprop.0+0xae/0xf0 [xfs]
[ 4549.947764] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.947766] xfs_btree_lookup_get_block+0xa2/0x180 [xfs]
[ 4549.947798] xfs_btree_lookup+0xe9/0x540 [xfs]
[ 4549.947830] xfs_alloc_lookup_eq+0x1d/0x30 [xfs]
[ 4549.947863] xfs_alloc_fixup_trees+0xe7/0x3b0 [xfs]
[ 4549.947893] xfs_alloc_cur_finish+0x2b/0xa0 [xfs]
[ 4549.947923] xfs_alloc_ag_vextent_near.constprop.0+0x3f2/0x4a0 [xfs]
[ 4549.947954] xfs_alloc_ag_vextent+0x13f/0x150 [xfs]
[ 4549.947983] xfs_alloc_vextent+0x327/0x450 [xfs]
[ 4549.948013] xfs_bmap_btalloc+0x44e/0x830 [xfs]
[ 4549.948047] xfs_bmapi_allocate+0xda/0x300 [xfs]
[ 4549.948076] xfs_bmapi_write+0x4ab/0x570 [xfs]
[ 4549.948109] xfs_da_grow_inode_int+0xd8/0x320 [xfs]
[ 4549.948141] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.948142] ? xfs_da_read_buf+0xf7/0x150 [xfs]
[ 4549.948171] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.948174] xfs_dir2_grow_inode+0x68/0x120 [xfs]
[ 4549.948204] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.948206] xfs_dir2_node_addname+0x5ea/0x9e0 [xfs]
[ 4549.948241] xfs_dir_createname+0x1cf/0x1e0 [xfs]
[ 4549.948271] xfs_rename+0x87e/0xcd0 [xfs]
[ 4549.948308] xfs_vn_rename+0xfa/0x170 [xfs]
[ 4549.948340] vfs_rename+0x818/0x10d0
[ 4549.948345] ? lookup_dcache+0x17/0x60
[ 4549.948348] ? do_renameat2+0x57f/0x5f0
[ 4549.948350] do_renameat2+0x57f/0x5f0
[ 4549.948355] __x64_sys_rename+0x3f/0x50
[ 4549.948357] do_syscall_64+0x34/0x80
[ 4549.948360] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 4549.948362] RIP: 0033:0x7fcc5520c1d7
[ 4549.948364] RSP: 002b:00007ffe3909c748 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[ 4549.948366] RAX: ffffffffffffffda RBX: 00007ffe3909c8f0 RCX: 00007fcc5520c1d7
[ 4549.948367] RDX: 0000000000000000 RSI: 00007ffe3909c8f0 RDI: 00007ffe3909e8f0
[ 4549.948368] RBP: 00007ffe3909e8f0 R08: 0000000000000000 R09: 00007ffe3909c2f8
[ 4549.948369] R10: 00007ffe3909c2f7 R11: 0000000000000246 R12: 0000000000000000
[ 4549.948370] R13: 00000000023c9c30 R14: 00000000000081a4 R15: 0000000000000004
[ 4549.948373] </TASK>
[ 4549.948374] INFO: task kworker/u64:1:4930 blocked for more than 122 seconds.
[ 4549.956299] Not tainted 5.15.164 #1-NixOS
[ 4549.961396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4549.970198] task:kworker/u64:1 state:D stack: 0 pid: 4930 ppid: 2 flags:0x00004000
[ 4549.970202] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4549.970205] Call Trace:
[ 4549.970206] <TASK>
[ 4549.970209] __schedule+0x373/0x1580
[ 4549.970211] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.970215] schedule+0x5b/0xe0
[ 4549.970217] md_bitmap_startwrite+0x177/0x1e0
[ 4549.970219] ? finish_wait+0x90/0x90
[ 4549.970223] add_stripe_bio+0x449/0x770 [raid456]
[ 4549.970229] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4549.970232] ? kmem_cache_alloc_node_trace+0x341/0x3e0
[ 4549.970236] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.970238] ? linear_map+0x44/0x90 [dm_mod]
[ 4549.970244] ? finish_wait+0x90/0x90
[ 4549.970245] ? __blk_queue_split+0x516/0x580
[ 4549.970248] md_handle_request+0x11f/0x1b0
[ 4549.970251] md_submit_bio+0x6e/0xb0
[ 4549.970254] __submit_bio+0x18c/0x220
[ 4549.970256] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.970258] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4549.970260] submit_bio_noacct+0xbe/0x2d0
[ 4549.970263] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4549.970267] process_one_work+0x1d3/0x360
[ 4549.970270] worker_thread+0x4d/0x3b0
[ 4549.970272] ? process_one_work+0x360/0x360
[ 4549.970274] kthread+0x115/0x140
[ 4549.970276] ? set_kthread_struct+0x50/0x50
[ 4549.970278] ret_from_fork+0x1f/0x30
[ 4549.970282] </TASK>
[ 4549.970284] INFO: task kworker/u64:2:4949 blocked for more than 123 seconds.
[ 4549.978205] Not tainted 5.15.164 #1-NixOS
[ 4549.983290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4549.992088] task:kworker/u64:2 state:D stack: 0 pid: 4949 ppid: 2 flags:0x00004000
[ 4549.992093] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4549.992097] Call Trace:
[ 4549.992098] <TASK>
[ 4549.992100] __schedule+0x373/0x1580
[ 4549.992103] ? sysvec_apic_timer_interrupt+0xa/0x90
[ 4549.992106] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 4549.992109] schedule+0x5b/0xe0
[ 4549.992111] md_bitmap_startwrite+0x177/0x1e0
[ 4549.992114] ? finish_wait+0x90/0x90
[ 4549.992117] add_stripe_bio+0x449/0x770 [raid456]
[ 4549.992122] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4549.992125] ? kmem_cache_alloc+0x261/0x3b0
[ 4549.992129] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.992131] ? linear_map+0x44/0x90 [dm_mod]
[ 4549.992135] ? finish_wait+0x90/0x90
[ 4549.992137] ? __blk_queue_split+0x516/0x580
[ 4549.992139] md_handle_request+0x11f/0x1b0
[ 4549.992142] md_submit_bio+0x6e/0xb0
[ 4549.992144] __submit_bio+0x18c/0x220
[ 4549.992146] ? srso_alias_return_thunk+0x5/0x7f
[ 4549.992148] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4549.992150] submit_bio_noacct+0xbe/0x2d0
[ 4549.992153] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4549.992157] process_one_work+0x1d3/0x360
[ 4549.992160] worker_thread+0x4d/0x3b0
[ 4549.992162] ? process_one_work+0x360/0x360
[ 4549.992163] kthread+0x115/0x140
[ 4549.992166] ? set_kthread_struct+0x50/0x50
[ 4549.992168] ret_from_fork+0x1f/0x30
[ 4549.992172] </TASK>
[ 4549.992174] INFO: task kworker/u64:5:4952 blocked for more than 123 seconds.
[ 4550.000095] Not tainted 5.15.164 #1-NixOS
[ 4550.005187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4550.013985] task:kworker/u64:5 state:D stack: 0 pid: 4952 ppid: 2 flags:0x00004000
[ 4550.013988] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4550.013992] Call Trace:
[ 4550.013993] <TASK>
[ 4550.013995] __schedule+0x373/0x1580
[ 4550.013997] ? sysvec_apic_timer_interrupt+0xa/0x90
[ 4550.014000] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 4550.014003] schedule+0x5b/0xe0
[ 4550.014005] md_bitmap_startwrite+0x177/0x1e0
[ 4550.014008] ? finish_wait+0x90/0x90
[ 4550.014010] add_stripe_bio+0x449/0x770 [raid456]
[ 4550.014015] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4550.014018] ? __bio_clone_fast+0xa5/0xe0
[ 4550.014022] ? finish_wait+0x90/0x90
[ 4550.014024] ? __blk_queue_split+0x2d0/0x580
[ 4550.014027] md_handle_request+0x11f/0x1b0
[ 4550.014030] md_submit_bio+0x6e/0xb0
[ 4550.014032] __submit_bio+0x18c/0x220
[ 4550.014034] ? srso_alias_return_thunk+0x5/0x7f
[ 4550.014036] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4550.014038] submit_bio_noacct+0xbe/0x2d0
[ 4550.014041] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4550.014044] process_one_work+0x1d3/0x360
[ 4550.014047] worker_thread+0x4d/0x3b0
[ 4550.014049] ? process_one_work+0x360/0x360
[ 4550.014050] kthread+0x115/0x140
[ 4550.014052] ? set_kthread_struct+0x50/0x50
[ 4550.014054] ret_from_fork+0x1f/0x30
[ 4550.014058] </TASK>
[ 4550.014059] INFO: task kworker/u64:8:4954 blocked for more than 123 seconds.
[ 4550.021982] Not tainted 5.15.164 #1-NixOS
[ 4550.027078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4550.035881] task:kworker/u64:8 state:D stack: 0 pid: 4954 ppid: 2 flags:0x00004000
[ 4550.035884] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4550.035887] Call Trace:
[ 4550.035888] <TASK>
[ 4550.035890] __schedule+0x373/0x1580
[ 4550.035893] ? sysvec_apic_timer_interrupt+0xa/0x90
[ 4550.035896] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 4550.035899] schedule+0x5b/0xe0
[ 4550.035901] md_bitmap_startwrite+0x177/0x1e0
[ 4550.035904] ? finish_wait+0x90/0x90
[ 4550.035907] add_stripe_bio+0x449/0x770 [raid456]
[ 4550.035912] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4550.035916] ? __bio_clone_fast+0xa5/0xe0
[ 4550.035919] ? finish_wait+0x90/0x90
[ 4550.035921] ? __blk_queue_split+0x2d0/0x580
[ 4550.035924] md_handle_request+0x11f/0x1b0
[ 4550.035927] md_submit_bio+0x6e/0xb0
[ 4550.035929] __submit_bio+0x18c/0x220
[ 4550.035931] ? srso_alias_return_thunk+0x5/0x7f
[ 4550.035933] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4550.035936] submit_bio_noacct+0xbe/0x2d0
[ 4550.035939] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4550.035942] process_one_work+0x1d3/0x360
[ 4550.035946] worker_thread+0x4d/0x3b0
[ 4550.035948] ? process_one_work+0x360/0x360
[ 4550.035949] kthread+0x115/0x140
[ 4550.035951] ? set_kthread_struct+0x50/0x50
[ 4550.035953] ret_from_fork+0x1f/0x30
[ 4550.035957] </TASK>
[ 4550.035958] INFO: task kworker/u64:9:4955 blocked for more than 123 seconds.
[ 4550.043881] Not tainted 5.15.164 #1-NixOS
[ 4550.048979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4550.057786] task:kworker/u64:9 state:D stack: 0 pid: 4955 ppid: 2 flags:0x00004000
[ 4550.057790] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ 4550.057794] Call Trace:
[ 4550.057796] <TASK>
[ 4550.057798] __schedule+0x373/0x1580
[ 4550.057801] ? sysvec_apic_timer_interrupt+0xa/0x90
[ 4550.057803] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 4550.057806] schedule+0x5b/0xe0
[ 4550.057808] md_bitmap_startwrite+0x177/0x1e0
[ 4550.057810] ? finish_wait+0x90/0x90
[ 4550.057813] add_stripe_bio+0x449/0x770 [raid456]
[ 4550.057818] raid5_make_request+0x1cf/0xbd0 [raid456]
[ 4550.057821] ? __bio_clone_fast+0xa5/0xe0
[ 4550.057824] ? finish_wait+0x90/0x90
[ 4550.057826] ? __blk_queue_split+0x2d0/0x580
[ 4550.057828] md_handle_request+0x11f/0x1b0
[ 4550.057831] md_submit_bio+0x6e/0xb0
[ 4550.057834] __submit_bio+0x18c/0x220
[ 4550.057835] ? srso_alias_return_thunk+0x5/0x7f
[ 4550.057837] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ 4550.057839] submit_bio_noacct+0xbe/0x2d0
[ 4550.057842] kcryptd_crypt+0x3a8/0x5a0 [dm_crypt]
[ 4550.057846] process_one_work+0x1d3/0x360
[ 4550.057848] worker_thread+0x4d/0x3b0
[ 4550.057850] ? process_one_work+0x360/0x360
[ 4550.057852] kthread+0x115/0x140
[ 4550.057854] ? set_kthread_struct+0x50/0x50
[ 4550.057856] ret_from_fork+0x1f/0x30
[ 4550.057860] </TASK>
On 7. Aug 2024, at 08:46, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
I tried updating to 5.15.164, but have to struggle against our config management as some options have been shifted that I need to filter out: NFSD_V3 and NFSD2_ACL are now fixed and cause config errors if set - I guess that’s a valid thing to happen within an LTS release. I’ll try again on Friday
On 7. Aug 2024, at 07:31, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
Sure,
would you prefer me testing on 5.15.x or something else?
On 7. Aug 2024, at 04:55, Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
Hi,
在 2024/08/06 22:10, Christian Theune 写道:
we are seeing an issue that can be triggered with relative ease on a server that has been working fine for a few weeks. The regular workload is a backup utility that copies off data from virtual disk images in 4MiB (compressed) chunks from Ceph onto a local NVME-based RAID-6 array that is encrypted using LUKS.
Today I started a larger rsync job from another server (that has a couple of million files with around 200-300 gib in total) to migrate data and we’ve seen the server suddenly lock up twice. Any IO that interacts with the mountpoint (/srv/backy) will hang indefinitely. A reset is required to get out of this as the machine will hang trying to unmount the affected filesystem. No other messages than the hung tasks are being presented - I have no indicator for hardware faults at the moment.
I’m messaging both dm-devel and linux-raid as I’m suspecting either one or both (or an interaction) might be the cause.
Kernel:
Linux version 5.15.138 (nixbld@localhost) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.40) #1-NixOS SMP Wed Nov 8 16:26:52 UTC 2023
Since you can trigger this easily, I'll suggest you to try the latest
kernel release first.
Thanks,
Kuai
See the kernel config attached.
Liebe Grüße,
Christian Theune
--
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
Liebe Grüße,
Christian Theune
--
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
Liebe Grüße,
Christian Theune
--
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
.