Hi, I have an issue with the 'bcache' Linux subsystem (block I/O cache). I hit a kernel panic when using this software, and I've reported that upstream on the "linux-bcache" mailing list: https://www.spinics.net/lists/linux-bcache/msg09069.html I'd like to contribute and learn more on how to debug this myself. Here is the output from 'crash' on a dumpfile from this panic: SYSTEM MAP: /home/marc.smith/Downloads/System.map-esos.prod DEBUG KERNEL: /home/marc.smith/Downloads/vmlinux-esos.prod (5.4.69-esos.prod) DUMPFILE: /home/marc.smith/Downloads/dumpfile-1604062993 CPUS: 8 DATE: Fri Oct 30 09:02:56 2020 UPTIME: 2 days, 12:38:15 LOAD AVERAGE: 9.48, 8.89, 7.69 TASKS: 980 NODENAME: node-10cccd-2 RELEASE: 5.4.69-esos.prod VERSION: #1 SMP Thu Oct 22 19:45:11 UTC 2020 MACHINE: x86_64 (2799 Mhz) MEMORY: 24 GB PANIC: "Oops: 0002 [#1] SMP NOPTI" (check log for details) PID: 18272 COMMAND: "kworker/2:13" TASK: ffff88841d9e8000 [THREAD_INFO: ffff88841d9e8000] CPU: 2 STATE: TASK_UNINTERRUPTIBLE (PANIC) crash> bt PID: 18272 TASK: ffff88841d9e8000 CPU: 2 COMMAND: "kworker/2:13" #0 [ffffc90000100938] machine_kexec at ffffffff8103d6b5 #1 [ffffc90000100980] __crash_kexec at ffffffff8110d37b #2 [ffffc90000100a48] crash_kexec at ffffffff8110e07d #3 [ffffc90000100a58] oops_end at ffffffff8101a9de #4 [ffffc90000100a78] no_context at ffffffff81045e99 #5 [ffffc90000100ae0] async_page_fault at ffffffff81e010cf [exception RIP: atomic_try_cmpxchg+2] RIP: ffffffff810d3e3b RSP: ffffc90000100b98 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000080006 RDX: 0000000000000001 RSI: ffffc90000100ba4 RDI: 0000000000000a6c RBP: 0000000000000010 R8: 0000000000000001 R9: ffffffffa0418d4e R10: ffff88841c8b3000 R11: ffff88841c8b3000 R12: 0000000000000046 R13: 0000000000000000 R14: ffff8885a3a0a000 R15: 0000000000000a6c ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffffc90000100b98] _raw_spin_lock_irqsave at ffffffff81cf7d7d #7 [ffffc90000100bb8] try_to_wake_up at ffffffff810c1624 #8 [ffffc90000100c08] closure_sync_fn at ffffffffa040fb07 [bcache] #9 [ffffc90000100c10] clone_endio at ffffffff81aac48c #10 [ffffc90000100c40] call_bio_endio at ffffffff81a78e20 #11 [ffffc90000100c58] raid_end_bio_io at ffffffff81a78e69 #12 [ffffc90000100c88] raid1_end_write_request at ffffffff81a79ad9 #13 [ffffc90000100cf8] blk_update_request at ffffffff814c3ab1 #14 [ffffc90000100d38] blk_mq_end_request at ffffffff814caaf2 #15 [ffffc90000100d50] blk_mq_complete_request at ffffffff814c91c1 #16 [ffffc90000100d78] nvme_complete_cqes at ffffffffa002fb03 [nvme] #17 [ffffc90000100db8] nvme_irq at ffffffffa002fb7f [nvme] #18 [ffffc90000100de0] __handle_irq_event_percpu at ffffffff810e0d60 #19 [ffffc90000100e20] handle_irq_event_percpu at ffffffff810e0e65 #20 [ffffc90000100e48] handle_irq_event at ffffffff810e0ecb #21 [ffffc90000100e60] handle_edge_irq at ffffffff810e494d #22 [ffffc90000100e78] do_IRQ at ffffffff81e01900 #23 [ffffc90000100eb0] common_interrupt at ffffffff81e00a0a #24 [ffffc90000100f38] __softirqentry_text_start at ffffffff8200006a #25 [ffffc90000100fc8] irq_exit at ffffffff810a3f6a #26 [ffffc90000100fd0] smp_apic_timer_interrupt at ffffffff81e020b2 bt: invalid kernel virtual address: ffffc90000101000 type: "pt_regs" crash> Looking at the call trace, I see this was the last function from 'bcache' in the trace (linux-5.4.69/drivers/md/bcache/closure.c): static void closure_sync_fn(struct closure *cl) { struct closure_syncer *s = cl->s; struct task_struct *p; rcu_read_lock(); p = READ_ONCE(s->task); s->done = 1; wake_up_process(p); rcu_read_unlock(); } And I believe the calls above this in my crash-backtrace output come from this call: wake_up_process() Is the panic perhaps because the task/process is already gone/finished? Not sure where to start looking next. Any help would be greatly appreciated. --Marc _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies