Re: use-after-free on log replay failure

Brian Foster <bfoster@xxxxxxxxxx> · Mon, 4 Aug 2014 10:12:57 -0400

On Mon, Aug 04, 2014 at 02:00:05PM +0300, Alex Lyakas wrote:
> Greetings,
> 
> we had a log replay failure due to some errors that the underlying block
> device returned:
> [49133.801406] XFS (dm-95): metadata I/O error: block 0x270e8c180
> ("xlog_recover_iodone") error 28 numblks 16
> [49133.802495] XFS (dm-95): log mount/recovery failed: error 28
> [49133.802644] XFS (dm-95): log mount failed
> 
> and then kernel panicked [1].
> 
> Looking at the code, when xfs_mountfs() fails, xfs_fs_fill_super() goes and
> cleans up and eventually frees the "xfs_mount" structure. But then
> xfs_buf_iodone_work() can still be delivered through "xfslogd_workqueue",
> which is static and not per-XFS. But this callback has a pointer to
> "xfs_mount", and may try to access it as in [1]. Does this analysis sound
> correct? Kernel is 3.8.13, but looking at the XFS master branch, it might
> have the same issue.
> 

Seems possible... we call xfs_buf_delwri_submit() via log recovery which
does an xfs_buf_iowait() on each buffer for synchronous I/O.
xfs_buf_iowait() doesn't wait on b_iowait if b_error is set, however. In
the callback side, xfs_buf_ioerror() is called before the
_xfs_buf_ioend() sequence that calls into the workqueue. Perhaps the
error can be detected by the iowait before the xfslogd_workqueue job can
run..?

> Should we flush this static workqueue before unmounting?
> 

It's not totally clear that fixes the problem since we set b_error on
the buffer before we even schedule on the workqueue. Perhaps it depends
on the nature of the race. Another option could be to not use the wq on
error, but that may or may not be ideal for other contexts besides log
recovery.

It looks like the bio code always returns error via the callback and the
xlog_recover_iodone() handler does call back into ioend, presumably to
do the io completion. It might not be appropriate to remove the b_error
bypass in xfs_buf_iowait() as we have ioerror() callers on the buffer
that can occur before I/O submission, but I wonder if it could be made
conditional for this particular case. That would be an interesting
experiment at least to see if it fixes this problem.

Brian

> Thanks,
> Alex.
> 
> 
> [1]
> [49133.804546] general protection fault: 0000 [#1] SMP
> [49133.808033]  xcbc rmd160 crypto_null af_key xfrm_algo scsi_dh cirrus
> psmouse ttm drm_kms_helper serio_raw drm i2c_piix4 sysimgblt virtio_balloon
> sysfillrect syscopyarea nfsd(OF) kvm nfs_acl auth_rpcgss nfs fscache
> microcode lockd mac_hid sunrpc lp parport floppy ixgbevf(OF)
> [49133.808033] CPU 2
> [49133.808033] Pid: 2907, comm: kworker/2:1H Tainted: GF       W  O
> 3.8.13-030813-generic #201305111843 Bochs Bochs
> [49133.808033] RIP: 0010:[<ffffffff813582fb>]  [<ffffffff813582fb>]
> strnlen+0xb/0x30
> [49133.808033] RSP: 0018:ffff8801e31c5b08  EFLAGS: 00010086
> [49133.808033] RAX: 0000000000000000 RBX: ffffffff81e4e527 RCX:
> 0000000000000000
> [49133.808033] RDX: 640000450008cf9d RSI: ffffffffffffffff RDI:
> 640000450008cf9d
> [49133.808033] RBP: ffff8801e31c5b08 R08: 000000000000ffff R09:
> 000000000000ffff
> [49133.808033] R10: 0000000000000000 R11: 0000000000000ffe R12:
> 640000450008cf9d
> [49133.808033] R13: ffffffff81e4e900 R14: 0000000000000000 R15:
> 000000000000ffff
> [49133.808033] FS:  0000000000000000(0000) GS:ffff88021fd00000(0000)
> knlGS:0000000000000000
> [49133.808033] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [49133.808033] CR2: 00007fa4a91abd80 CR3: 000000020e783000 CR4:
> 00000000000006e0
> [49133.808033] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [49133.808033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [49133.808033] Process kworker/2:1H (pid: 2907, threadinfo ffff8801e31c4000,
> task ffff8802124f45c0)
> [49133.808033] Stack:
> [49133.808033]  ffff8801e31c5b48 ffffffff81359d8e ffff8801e31c5b28
> ffffffff81e4e527
> [49133.808033]  ffffffffa0636a1e ffff8801e31c5c80 ffffffffa0636a1e
> ffffffff81e4e900
> [49133.808033]  ffff8801e31c5bc8 ffffffff8135af89 ffff8801e31c5bc8
> ffffffff8105a4e7
> [49133.808033] Call Trace:
> [49133.808033]  [<ffffffff81359d8e>] string.isra.4+0x3e/0xd0
> [49133.808033]  [<ffffffff8135af89>] vsnprintf+0x219/0x640
> [49133.808033]  [<ffffffff8105a4e7>] ? msg_print_text+0xb7/0x1b0
> [49133.808033]  [<ffffffff8135b471>] vscnprintf+0x11/0x30
> [49133.808033]  [<ffffffff8105b3b1>] vprintk_emit+0xc1/0x490
> [49133.808033]  [<ffffffff8105b460>] ? vprintk_emit+0x170/0x490
> [49133.808033]  [<ffffffff816d5848>] printk+0x61/0x63
> [49133.808033]  [<ffffffffa05ba261>] __xfs_printk+0x31/0x50 [xfs]
> [49133.808033]  [<ffffffffa05ba4b3>] xfs_notice+0x53/0x60 [xfs]
> [49133.808033]  [<ffffffffa05b14a5>] xfs_do_force_shutdown+0xf5/0x180 [xfs]
> [49133.808033]  [<ffffffffa05f6c38>] ? xlog_recover_iodone+0x48/0x70 [xfs]
> [49133.808033]  [<ffffffffa05f6c38>] xlog_recover_iodone+0x48/0x70 [xfs]
> [49133.808033]  [<ffffffffa05a749d>] xfs_buf_iodone_work+0x4d/0xa0 [xfs]
> [49133.808033]  [<ffffffff81078b81>] process_one_work+0x141/0x490
> [49133.808033]  [<ffffffff81079b48>] worker_thread+0x168/0x400
> [49133.808033]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120
> [49133.808033]  [<ffffffff8107f050>] kthread+0xc0/0xd0
> [49133.808033]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
> [49133.808033]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
> [49133.808033]  [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs