Re: XFS umount with IO errors seems to lead to memory corruption

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 11 Dec 2013 11:40:39 +1100

[ Sorry, Alex, I missed your last email. Thanks for pinging me to
remind me to look at it. ]

On Tue, Dec 10, 2013 at 09:36:11AM +0200, Alex Lyakas wrote:
> Hi Dave,
> any insight on this issue? At least on the simpler reproduction with
> "error" DeviceMapper?

Yes, it does point to the problem.

> -----Original Message----- From: Alex Lyakas
> Sent: 24 November, 2013 12:27 PM
> To: Dave Chinner ; xfs@xxxxxxxxxxx
> Cc: linux-xfs@xxxxxxxxxxxxxxx
> Subject: Re: XFS umount with IO errors seems to lead to memory corruption
> 
> Hi Dave,
> thank you for your comments.
> 
> The test that I am doing is unmounting the XFS, while its underlying
> block device returns intermittent IO errors. The block device in
> question is a custom DeviceMapper target. It returns -ECANCELED in
> this case. Should I return some other errno instead?
> The same exact test works alright with ext4. It's unmount finishes,
> system seems to continue functioning normally and kmemleak is also
> happy.
> 
> When doing a simpler reproductoin with "error" Device-Mapper, umount
> gets stuck and never returns, while kernel keeps printing:
> XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 numblks 1

It's trying to write the superblock - it's and async, background
metadata write, and it's failing.

        /*
         * If the write was asynchronous then no one will be looking for the
         * error.  Clear the error state and write the buffer out again.
         *
         * XXX: This helps against transient write errors, but we need to find
         * a way to shut the filesystem down if the writes keep failing.
         *
         * In practice we'll shut the filesystem down soon as non-transient
         * erorrs tend to affect the whole device and a failing log write
         * will make us give up.  But we really ought to do better here.
         */
        if (XFS_BUF_ISASYNC(bp)) {
                ASSERT(bp->b_iodone != NULL);

                trace_xfs_buf_item_iodone_async(bp, _RET_IP_);

                xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the flag */

                if (!XFS_BUF_ISSTALE(bp)) {
                        bp->b_flags |= XBF_WRITE | XBF_ASYNC | XBF_DONE;
                        xfs_buf_iorequest(bp);
                } else {
                        xfs_buf_relse(bp);
                }

                return;
        }

There's the problem code - it just keeps resubmitting the failed IO
and so never unlocks it and it never completes.

> this never returns and /proc shows:
> root@vc-00-00-1075-dev:~# cat /proc/2684/stack
> [<ffffffffa033ac6a>] xfs_ail_push_all_sync+0x9a/0xd0 [xfs]
> [<ffffffffa0330123>] xfs_unmountfs+0x63/0x160 [xfs]
> [<ffffffffa02ee265>] xfs_fs_put_super+0x25/0x60 [xfs]
> [<ffffffff8118fd12>] generic_shutdown_super+0x62/0xf0
> [<ffffffff8118fdd0>] kill_block_super+0x30/0x80
> [<ffffffff811903dc>] deactivate_locked_super+0x3c/0x90
> [<ffffffff81190d7e>] deactivate_super+0x4e/0x70
> [<ffffffff811ad086>] mntput_no_expire+0x106/0x160
> [<ffffffff811ae760>] sys_umount+0xa0/0xe0
> [<ffffffff816ab919>] system_call_fastpath+0x16/0x1b
> [<FFFfffffffffffff>] 0xffffffffffffffff

That's waiting for the superblock to be marked clean.

> And after some time, hung task warning shows:
> INFO: task kworker/2:1:39 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/2:1     D ffffffff8180cf00     0    39      2 0x00000000
> ffff88007c54db38 0000000000000046 000000027d003700 ffff88007fd03fc0
> ffff88007c54dfd8 ffff88007c54dfd8 ffff88007c54dfd8 0000000000013e40
> ffff88007c9e9710 ffff88007c4bdc40 00000000000000b8 7fffffffffffffff
> Call Trace:
> [<ffffffff816a1b99>] schedule+0x29/0x70
> [<ffffffff816a02d5>] schedule_timeout+0x1e5/0x250
> [<ffffffffa02f3987>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
> [<ffffffff816798e6>] ? kmemleak_alloc+0x26/0x50
> [<ffffffff816a0f1b>] __down_common+0xa0/0xf0
> [<ffffffffa032f37c>] ? xfs_getsb+0x3c/0x70 [xfs]
> [<ffffffff816a0fde>] __down+0x1d/0x1f
> [<ffffffff81084591>] down+0x41/0x50
> [<ffffffffa02dcd44>] xfs_buf_lock+0x44/0x110 [xfs]
> [<ffffffffa032f37c>] xfs_getsb+0x3c/0x70 [xfs]
> [<ffffffffa033b4bc>] xfs_trans_getsb+0x4c/0x140 [xfs]
> [<ffffffffa032f06e>] xfs_mod_sb+0x4e/0xc0 [xfs]
> [<ffffffffa02e3b24>] xfs_fs_log_dummy+0x54/0x90 [xfs]
> [<ffffffffa0335bf8>] xfs_log_worker+0x48/0x50 [xfs]
> [<ffffffff81077a11>] process_one_work+0x141/0x4a0
> [<ffffffff810789e8>] worker_thread+0x168/0x410
> [<ffffffff81078880>] ? manage_workers+0x120/0x120
> [<ffffffff8107df10>] kthread+0xc0/0xd0
> [<ffffffff813a3ea4>] ? acpi_get_child+0x47/0x4d
> [<ffffffff813a3fb7>] ? acpi_platform_notify.part.0+0xbb/0xda
> [<ffffffff8107de50>] ? flush_kthread_worker+0xb0/0xb0
> [<ffffffff816ab86c>] ret_from_fork+0x7c/0xb0
> [<ffffffff8107de50>] ? flush_kthread_worker+0xb0/0xb0

And that's blocked on the superblock buffer because it hasn't been
unlocked due to the failing write not completing.

I'll have a think about how to fix it.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs