Hi everyone! Recently I got next trace on one of machines (backtrace can be not in correct order since it captured via netconsole): [7932653.070316] ------------[ cut here ]------------ [7932653.070359] kernel BUG at fs/jbd2/journal.c:759! [7932653.070392] invalid opcode: 0000 [#1] SMP [7932653.070424] Modules linked in: ... [7932653.071108] CPU: 0 PID: 356 Comm: jbd2/md0-8 Not tainted 4.8.1 #1 [7932653.071164] Hardware name: Supermicro Super Server/X10DRW-i, BIOS 1.1 08/13/2015 [7932653.071222] task: ffff881ff1babfc0 task.stack: ffff881fef890000 [<ffffffffa0454728>] jbd2_journal_next_log_block+0x68/0x70 [jbd2] [7932653.071258] RIP: 0010:[<ffffffffa0454728>] [7932653.071338] RSP: 0018:ffff881fef893cc8 EFLAGS: 00010246 [7932653.071374] RAX: 0000000000000001 RBX: ffff883fed5a8800 RCX: 00000000000007c4 [7932653.071432] RDX: 00000000000000ff RSI: ffff881fef893d68 RDI: ffff883fed5a8824 [7932653.071489] RBP: ffff881fef893d68 R08: 0000000000000000 R09: 0000000000000000 [7932653.071546] R10: 0000000000000001 R11: 0000000002408840 R12: ffff8822a92aad00 [7932653.071603] R13: ffff882dfc7b083c R14: ffff883fed5a8800 R15: ffff883fecb5ad00 [7932653.071659] FS: 0000000000000000(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000 [7932653.071718] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [7932653.071753] CR2: 00007f2d363fc730 CR3: 0000000001806000 CR4: 00000000001406f0 [7932653.071809] Stack: 0000000800000018 ffffffffa044d7e5 ffff881fef893e60 [7932653.071843] ffff881274fca7f8 ffff883fecb5ad9c ffff881fe4e26f00 001c2ecad4ef6dba [7932653.071913] ffff881ff1babfc0 ffff882dfc7b083c ffff883ff10c3000 000007c400000000 [7932653.071982] ffff881ff1babfc0 [7932653.072051] Call Trace: [7932653.072082] [<ffffffffa044d7e5>] ? jbd2_journal_commit_transaction+0xa65/0x18a0 [jbd2] [7932653.072143] [<ffffffffa04530f2>] ? kjournald2+0xb2/0x240 [jbd2] [7932653.072182] [<ffffffff81088e90>] ? wake_atomic_t_function+0x50/0x50 [7932653.072220] [<ffffffffa0453040>] ? commit_timeout+0x10/0x10 [jbd2] [7932653.072258] [<ffffffff8106e17d>] ? kthread+0xbd/0xe0 [7932653.072294] [<ffffffff814638cf>] ? ret_from_fork+0x1f/0x40 [7932653.072330] [<ffffffff8106e0c0>] ? kthread_worker_fn+0x160/0x160 ... [<ffffffffa0454728>] jbd2_journal_next_log_block+0x68/0x70 [jbd2] [7932653.072732] RIP [7932653.072793] RSP <ffff881fef893cc8> [7932653.073307] ---[ end trace 3f9eca026a57b5c2 ]--- And a lot of WARN_ON(blk_needs_flush_plug(tsk)) after: [7932653.115857] WARNING: CPU: 0 PID: 356 at kernel/exit.c:737 do_exit+0x50/0xad0 ... [7932653.123101] Call Trace: [7932653.123188] [<ffffffff8122fd58>] ? dump_stack+0x46/0x5e [7932653.123280] [<ffffffff8105312e>] ? __warn+0xbe/0xe0 [7932653.123371] [<ffffffff810557b0>] ? do_exit+0x50/0xad0 [7932653.123462] [<ffffffff81464a37>] ? rewind_stack_do_exit+0x17/0x20 [7932653.123556] [<ffffffff8106e0c0>] ? kthread_worker_fn+0x160/0x160 [7932653.123648] ---[ end trace 3f9eca026a57b5c4 ]--- [7932653.123737] Fixing recursive fault but reboot is needed! [7932653.123830] BUG: unable to handle kernel I looked through the code on and on, mailing-lists ([1] and [2]), fixed bugs (seems that [3] cannot make this), but couldn't find anything that can be relevant. And there is a comment in jbd2_journal_commit_transaction(): /* * start_this_handle() uses t_outstanding_credits to determine * the free space in the log, but this counter is changed * by jbd2_journal_next_log_block() also. */ atomic_dec(&commit_transaction->t_outstanding_credits); And it is there over decades (since "fs/jbd"), but nothing in jbd2_journal_next_log_block() touches commit_transaction, maybe it is about "j_free" (since jbd2_log_space_left() uses t_outstanding_credits and j_free). [1]: https://lkml.org/lkml/2005/1/19/144 [2]: https://lkml.org/lkml/2008/1/11/235 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=559cce698eaf4ccecb2213b2519ea3a0413e5155 Right now I don't have any reasons to think that this is the hardware problem, but it is SAMSUNG MZ7KM960HAHP-0Z005 (SSD). Also this is not reproducible, I got it once (a few days ago, and it works well before for a year or so), and since than I tried to reproduce it without any luck (with kvm and without). It will be great if you can give me some hints in debugging this. Thanks, Azat.