Re: Crash in jbd2_chksum due to null journal->j_chksum_driver

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Wed, 30 Sep 2015 10:12:39 -0700

On Wed, Sep 30, 2015 at 04:35:49PM +0300, Nikolay Borisov wrote:
> Hello, 
> 
> Today a colleague was testing something and while doing so he observed 
> the following crash: 
> 
> jbd2_journal_bmap: journal block not found at offset 67 on dm-26-8
> Aborting journal on device dm-26-8.
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffff812b12eb>] jbd2_superblock_csum+0x2b/0x80
> PGD 3fcef54067 PUD 3fce84e067 PMD 0 
> Oops: 0000 [#1] SMP 
> Modules linked in: act_police cls_basic sch_ingress veth dm_snapshot openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ses enclosure igb i2c_algo_bit x86_pkg_temp_thermal crc32_pclmul i2c_i801 lpc_ich mfd_core ioapic ioatdma dca shpchp ipmi_devintf ipmi_si ipmi_msghandler
> CPU: 0 PID: 12059 Comm: jbd2/dm-26-8 Not tainted 3.12.47-clouder1 #1
> Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
> task: ffff883f904958b0 ti: ffff883fce4d8000 task.ti: ffff883fce4d8000
> RIP: 0010:[<ffffffff812b12eb>]  [<ffffffff812b12eb>] jbd2_superblock_csum+0x2b/0x80
> RSP: 0018:ffff883fce4d9a58  EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff883f8dd77000 RCX: 0000000000000006
> RDX: 0000000000000000 RSI: ffff883f8dd77000 RDI: ffff883fa0fc6800
> RBP: ffff883fce4d9a88 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 00000000f0459c0b
> R13: 0000000000000411 R14: ffff883f8dd77000 R15: 00000000560bb55d
> FS:  0000000000000000(0000) GS:ffff881fffa00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000003fd145d000 CR4: 00000000001407f0
> Stack:
>  ffffffff81e07402 ffff883fa0fc6800 00000000fffffffb ffff883fce4d9b90
>  ffff883f8dd77000 ffff883fa0fc6800 ffff883fce4d9aa8 ffffffff812b1369
>  0000000000000010 ffff883f90c772d8 ffff883fce4d9ae8 ffffffff812b1455
> Call Trace:
>  [<ffffffff812b1369>] jbd2_superblock_csum_set+0x29/0x40
>  [<ffffffff812b1455>] jbd2_write_superblock+0x85/0x1b0
>  [<ffffffff812b1b70>] jbd2_journal_update_sb_errno+0x50/0x60
>  [<ffffffff812b1bd0>] __journal_abort_soft+0x50/0x60
>  [<ffffffff812b1c80>] jbd2_journal_bmap+0x90/0xa0
>  [<ffffffff812b1ec7>] jbd2_journal_next_log_block+0x77/0x80
>  [<ffffffff812b1ef3>] jbd2_journal_get_descriptor_buffer+0x23/0xb0
>  [<ffffffff812aa02c>] journal_submit_commit_record+0x7c/0x1e0
>  [<ffffffff812abade>] jbd2_journal_commit_transaction+0x194e/0x1d20
>  [<ffffffff812b062f>] kjournald2+0xef/0x2b0
>  [<ffffffff810aef00>] ? wake_up_bit+0x40/0x40
>  [<ffffffff812b0540>] ? commit_timeout+0x10/0x10
>  [<ffffffff810ae48e>] kthread+0xce/0xe0
>  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
>  [<ffffffff816571c8>] ret_from_fork+0x58/0x90
>  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
> Code: 55 48 89 e5 41 54 53 48 83 ec 20 0f 1f 44 00 00 44 8b a6 fc 00 00 00 48 89 f3 c7 86 fc 00 00 00 00 00 00 00 48 8b 87 d0 04 00 00 <83> 38 04 77 39 48 89 45 d0 c7 45 d8 00 00 00 00 48 8d 7d d0 c7 
> RIP  [<ffffffff812b12eb>] jbd2_superblock_csum+0x2b/0x80
>  RSP <ffff883fce4d9a58>
> CR2: 0000000000000000
> ---[ end trace e1bd94031f410b71 ]---
> 
> The ffffffff812b12eb address actually is jbd2_chksum and the 
> instruction where the deference is happening in 
> crypto_shash_descsize(), essentially journal->j_chksum_driver is NULL. 
> 
> Now, how we got ourselves in this situation - we have an lvm thin 
> volume with ext4 fs and a container started from it,
> then, while the container is running we invoke the following 
> command to scrub its contents:
> 
> openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt </dev/zero | dd bs=64K of=/dev/volumegroupname/volumename
> 
> 
> And then when we try to umount the volume we get the aforementioned 
> crash. Naturally, because we overwrite the on-disk contents jbd2_journal_bmap 
> fails which triggers the journal abort which wants to update the on-disk
> errno, which naturally triggers a superblock checksum regeneration
> and this goes BOOM. 
> 
> I looked around the code but couldn't figure out a code path
> which allows the checksum driver to become null at runtime.

Most likely is that the journal wasn't started with the checksum driver
turned on, and then your randomizing of the journal sb *while it was running*
flipped the feature bit on, causing jbd2 to think checksumming was turned on.

I guess the "proper" fix is to set j_chksum_driver at journal load time if
the superblock flags are set properly and then gate all other accesses on
the status of j_chksum_driver just in case someone obliterates the journal sb.

OTOH, why can't you unmount the FS and /then/ randomize the disk?

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html