Re: Crash in jbd2_chksum due to null journal->j_chksum_driver

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Wed, 30 Sep 2015 11:43:18 -0700

On Wed, Sep 30, 2015 at 09:13:38PM +0300, Nikolay Borisov wrote:
> Hello,
> 
> 
> Well, I guess I can do that the thing is the current scenario was like
> that. Anyway,
> I thought something like what you describe could be happening. I saw
> your patch and I'm
> going to test it tomorrow. But I think the patch needs to be tagged
> for stable since there
> is going to be effort to make filesystems mountable in non-init user
> namespace and
> an arbitrary user could potentially cause instability on the system?

<shrug> If non-root users can write arbitrarily to block devices, I'm
sure a /lot/ more bad things can happen.  But you're right, we could
at least avoid crashing.

--D

> 
> Regards,
> Nikolay
> 
> On Wed, Sep 30, 2015 at 8:12 PM, Darrick J. Wong
> <darrick.wong@xxxxxxxxxx> wrote:
> > On Wed, Sep 30, 2015 at 04:35:49PM +0300, Nikolay Borisov wrote:
> >> Hello,
> >>
> >> Today a colleague was testing something and while doing so he observed
> >> the following crash:
> >>
> >> jbd2_journal_bmap: journal block not found at offset 67 on dm-26-8
> >> Aborting journal on device dm-26-8.
> >> BUG: unable to handle kernel NULL pointer dereference at           (null)
> >> IP: [<ffffffff812b12eb>] jbd2_superblock_csum+0x2b/0x80
> >> PGD 3fcef54067 PUD 3fce84e067 PMD 0
> >> Oops: 0000 [#1] SMP
> >> Modules linked in: act_police cls_basic sch_ingress veth dm_snapshot openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ses enclosure igb i2c_algo_bit x86_pkg_temp_thermal crc32_pclmul i2c_i801 lpc_ich mfd_core ioapic ioatdma dca shpchp ipmi_devintf ipmi_si ipmi_msghandler
> >> CPU: 0 PID: 12059 Comm: jbd2/dm-26-8 Not tainted 3.12.47-clouder1 #1
> >> Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
> >> task: ffff883f904958b0 ti: ffff883fce4d8000 task.ti: ffff883fce4d8000
> >> RIP: 0010:[<ffffffff812b12eb>]  [<ffffffff812b12eb>] jbd2_superblock_csum+0x2b/0x80
> >> RSP: 0018:ffff883fce4d9a58  EFLAGS: 00010282
> >> RAX: 0000000000000000 RBX: ffff883f8dd77000 RCX: 0000000000000006
> >> RDX: 0000000000000000 RSI: ffff883f8dd77000 RDI: ffff883fa0fc6800
> >> RBP: ffff883fce4d9a88 R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000000000000001 R11: 0000000000000000 R12: 00000000f0459c0b
> >> R13: 0000000000000411 R14: ffff883f8dd77000 R15: 00000000560bb55d
> >> FS:  0000000000000000(0000) GS:ffff881fffa00000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000000000000000 CR3: 0000003fd145d000 CR4: 00000000001407f0
> >> Stack:
> >>  ffffffff81e07402 ffff883fa0fc6800 00000000fffffffb ffff883fce4d9b90
> >>  ffff883f8dd77000 ffff883fa0fc6800 ffff883fce4d9aa8 ffffffff812b1369
> >>  0000000000000010 ffff883f90c772d8 ffff883fce4d9ae8 ffffffff812b1455
> >> Call Trace:
> >>  [<ffffffff812b1369>] jbd2_superblock_csum_set+0x29/0x40
> >>  [<ffffffff812b1455>] jbd2_write_superblock+0x85/0x1b0
> >>  [<ffffffff812b1b70>] jbd2_journal_update_sb_errno+0x50/0x60
> >>  [<ffffffff812b1bd0>] __journal_abort_soft+0x50/0x60
> >>  [<ffffffff812b1c80>] jbd2_journal_bmap+0x90/0xa0
> >>  [<ffffffff812b1ec7>] jbd2_journal_next_log_block+0x77/0x80
> >>  [<ffffffff812b1ef3>] jbd2_journal_get_descriptor_buffer+0x23/0xb0
> >>  [<ffffffff812aa02c>] journal_submit_commit_record+0x7c/0x1e0
> >>  [<ffffffff812abade>] jbd2_journal_commit_transaction+0x194e/0x1d20
> >>  [<ffffffff812b062f>] kjournald2+0xef/0x2b0
> >>  [<ffffffff810aef00>] ? wake_up_bit+0x40/0x40
> >>  [<ffffffff812b0540>] ? commit_timeout+0x10/0x10
> >>  [<ffffffff810ae48e>] kthread+0xce/0xe0
> >>  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
> >>  [<ffffffff816571c8>] ret_from_fork+0x58/0x90
> >>  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
> >> Code: 55 48 89 e5 41 54 53 48 83 ec 20 0f 1f 44 00 00 44 8b a6 fc 00 00 00 48 89 f3 c7 86 fc 00 00 00 00 00 00 00 48 8b 87 d0 04 00 00 <83> 38 04 77 39 48 89 45 d0 c7 45 d8 00 00 00 00 48 8d 7d d0 c7
> >> RIP  [<ffffffff812b12eb>] jbd2_superblock_csum+0x2b/0x80
> >>  RSP <ffff883fce4d9a58>
> >> CR2: 0000000000000000
> >> ---[ end trace e1bd94031f410b71 ]---
> >>
> >> The ffffffff812b12eb address actually is jbd2_chksum and the
> >> instruction where the deference is happening in
> >> crypto_shash_descsize(), essentially journal->j_chksum_driver is NULL.
> >>
> >> Now, how we got ourselves in this situation - we have an lvm thin
> >> volume with ext4 fs and a container started from it,
> >> then, while the container is running we invoke the following
> >> command to scrub its contents:
> >>
> >> openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt </dev/zero | dd bs=64K of=/dev/volumegroupname/volumename
> >>
> >>
> >> And then when we try to umount the volume we get the aforementioned
> >> crash. Naturally, because we overwrite the on-disk contents jbd2_journal_bmap
> >> fails which triggers the journal abort which wants to update the on-disk
> >> errno, which naturally triggers a superblock checksum regeneration
> >> and this goes BOOM.
> >>
> >> I looked around the code but couldn't figure out a code path
> >> which allows the checksum driver to become null at runtime.
> >
> > Most likely is that the journal wasn't started with the checksum driver
> > turned on, and then your randomizing of the journal sb *while it was running*
> > flipped the feature bit on, causing jbd2 to think checksumming was turned on.
> >
> > I guess the "proper" fix is to set j_chksum_driver at journal load time if
> > the superblock flags are set properly and then gate all other accesses on
> > the status of j_chksum_driver just in case someone obliterates the journal sb.
> >
> > OTOH, why can't you unmount the FS and /then/ randomize the disk?
> >
> > --D
> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html