Not sure who to blame, but my server locked up when upgraded (accessing volume group information), and echoing in "w" into sysrq-trigger showed a bit of information. First, looking at my dmesg I see that my usb-storage is hung up, for whatever reason. Thus, this could be the source of all issues. [5434447.145737] INFO: task usb-storage:32246 blocked for more than 120 seconds. [5434447.145740] Not tainted 5.2.4-custom #4 (BTW, I was upgrading to my 5.2.17 kernel when this happened) [5434447.145741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [5434447.145743] usb-storage D 0 32246 2 0x80004000 [5434447.145745] Call Trace: [5434447.145749] ? __schedule+0x1e8/0x600 [5434447.145752] schedule+0x29/0x90 [5434447.145755] schedule_timeout+0x208/0x300 [5434447.145765] ? usb_hcd_submit_urb+0xbe/0xb90 [usbcore] [5434447.145773] ? usb_alloc_urb+0x23/0x70 [usbcore] [5434447.145782] ? usb_sg_init+0x92/0x2b0 [usbcore] [5434447.145791] ? usb_hcd_submit_urb+0xbe/0xb90 [usbcore] [5434447.145795] ? __switch_to_asm+0x34/0x70 [5434447.145798] wait_for_completion+0x111/0x180 [5434447.145800] ? wake_up_q+0x70/0x70 [5434447.145809] usb_sg_wait+0xfa/0x150 [usbcore] [5434447.145814] usb_stor_bulk_transfer_sglist.part.4+0x64/0xb0 [usb_storage] [5434447.145818] usb_stor_bulk_srb+0x49/0x80 [usb_storage] [5434447.145821] usb_stor_Bulk_transport+0x167/0x3e0 [usb_storage] [5434447.145824] ? schedule+0x29/0x90 [5434447.145828] ? usb_stor_disconnect+0xb0/0xb0 [usb_storage] [5434447.145832] usb_stor_invoke_transport+0x3a/0x4e0 [usb_storage] [5434447.145835] ? wait_for_completion_interruptible+0x12d/0x1d0 [5434447.145837] ? wake_up_q+0x70/0x70 [5434447.145841] usb_stor_control_thread+0x1c5/0x270 [usb_storage] [5434447.145845] kthread+0x116/0x130 [5434447.145847] ? kthread_create_worker_on_cpu+0x70/0x70 [5434447.145851] ret_from_fork+0x35/0x40 But then this messed up with this I presume: [5434567.980526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [5434567.980529] pool D 0 32364 1 0x00000000 [5434567.980532] Call Trace: [5434567.980543] ? __schedule+0x1e8/0x600 [5434567.980547] schedule+0x29/0x90 [5434567.980558] scsi_block_when_processing_errors+0xb7/0xc0 [scsi_mod] [5434567.980564] ? finish_wait+0x80/0x80 [5434567.980568] sd_open+0x49/0x170 [sd_mod] [5434567.980574] __blkdev_get+0x45a/0x510 [5434567.980577] ? bd_acquire+0xb0/0xb0 [5434567.980579] blkdev_get+0x108/0x300 [5434567.980582] ? bd_acquire+0xb0/0xb0 [5434567.980588] do_dentry_open+0x13a/0x370 [5434567.980593] path_openat+0x2c6/0x1450 [5434567.980598] ? part_stat_show+0x427/0x450 [5434567.980603] do_filp_open+0x93/0x100 [5434567.980608] ? __check_object_size+0x15d/0x189 [5434567.980612] do_sys_open+0x186/0x220 [5434567.980616] do_syscall_64+0x4e/0x100 [5434567.980621] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [5434567.980625] RIP: 0033:0x7f8b4b034d0e [5434567.980632] Code: Bad RIP value. [5434567.980633] RSP: 002b:00007f8b49a389d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [5434567.980636] RAX: ffffffffffffffda RBX: 00007f8b3c035490 RCX: 00007f8b4b034d0e [5434567.980638] RDX: 0000000000000800 RSI: 000055d82f508d30 RDI: 00000000ffffff9c [5434567.980639] RBP: 0000000000000000 R08: 0000000000000000 R09: 000055d82f3eff80 [5434567.980641] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f8b49a38bd0 [5434567.980642] R13: 00007f8b49a38af8 R14: 0000000000000000 R15: 00007f8b3c035490 and all access to __blkdev_get() blocks on this volume too: [5437427.042327] vgs D 0 32353 1 0x80004004 [5437427.042334] Call Trace: [5437427.042341] ? __schedule+0x1e8/0x600 [5437427.042347] schedule+0x29/0x90 [5437427.042354] schedule_timeout+0x208/0x300 [5437427.042363] ? __kfifo_from_user_r+0xb0/0xb0 [5437427.042369] ? __percpu_ref_switch_mode+0xcd/0x170 [5437427.042375] ? __wake_up_common_lock+0x89/0xc0 [5437427.042381] wait_for_completion+0x111/0x180 [5437427.042387] ? wake_up_q+0x70/0x70 [5437427.042395] exit_aio+0xdc/0xe3 [5437427.042404] mmput+0x28/0x120 [5437427.042411] do_exit+0x28b/0xb90 [5437427.042420] ? __switch_to+0x3a6/0x460 [5437427.042425] ? _cond_resched+0x15/0x30 [5437427.042428] ? mutex_lock+0xe/0x30 [5437427.042430] ? aio_read_events+0x28a/0x320 [5437427.042434] do_group_exit+0x3a/0xa0 [5437427.042438] get_signal+0x14b/0x840 [5437427.042444] do_signal+0x36/0x660 [5437427.042449] ? __hrtimer_init+0xb0/0xb0 [5437427.042452] ? do_io_getevents+0x7c/0xc0 [5437427.042456] exit_to_usermode_loop+0x71/0xe0 [5437427.042459] do_syscall_64+0xf2/0x100 [5437427.042463] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [5437427.042465] RIP: 0033:0x7fc12f303f59 [5437427.042470] Code: Bad RIP value. [5437427.042472] RSP: 002b:00007fffffad42e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d0 [5437427.042474] RAX: fffffffffffffffc RBX: 00007fc12efa2700 RCX: 00007fc12f303f59 [5437427.042475] RDX: 0000000000000040 RSI: 0000000000000001 RDI: 00007fc12f974000 [5437427.042477] RBP: 00007fc12f974000 R08: 0000000000000000 R09: 00000002fffffe30 [5437427.042478] R10: 00007fffffad4370 R11: 0000000000000246 R12: 0000000000000001 [5437427.042479] R13: 0000000000000002 R14: 0000000000000040 R15: 00007fffffad4370 Full dmesg is attached. -- Steve
Attachment:
dmesg.xz
Description: application/xz