Re: XFS umount with IO errors seems to lead to memory corruption

"Alex Lyakas" <alex@xxxxxxxxxxxxxxxxx> · Tue, 10 Dec 2013 09:36:11 +0200

Hi Dave,
any insight on this issue? At least on the simpler reproduction with "error" 
DeviceMapper?

Thanks,
Alex.

-----Original Message----- 
From: Alex Lyakas
Sent: 24 November, 2013 12:27 PM
To: Dave Chinner ; xfs@xxxxxxxxxxx
Cc: linux-xfs@xxxxxxxxxxxxxxx
Subject: Re: XFS umount with IO errors seems to lead to memory corruption

Hi Dave,
thank you for your comments.

The test that I am doing is unmounting the XFS, while its underlying
block device returns intermittent IO errors. The block device in
question is a custom DeviceMapper target. It returns -ECANCELED in
this case. Should I return some other errno instead?
The same exact test works alright with ext4. It's unmount finishes,
system seems to continue functioning normally and kmemleak is also
happy.

When doing a simpler reproductoin with "error" Device-Mapper, umount
gets stuck and never returns, while kernel keeps printing:
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
...

What is the expected behavior of umount in this case? (when there are
IO errors during it).

This reproduction that I do is:

root@vc-00-00-1075-dev:~# cat /etc/xfs.protofile
dummy                   : bootfilename, not used, backward compatibility
0 0                             : numbers of blocks and inodes, not
used, backward compatibility
d--777 0 0              : set 777 perms for the root dir
$
$

root@vc-00-00-1075-dev:~# mkfs.xfs -f /dev/vde -p /etc/xfs.protofile
meta-data=/dev/vde               isize=256    agcount=4, agsize=1310720 blks
        =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=5242880, imaxpct=25
        =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=2560, version=2
        =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

root@vc-00-00-1075-dev:~# blockdev --getsz /dev/vde
41943040

root@vc-00-00-1075-dev:~# dmsetup create VDE --table "0 41943040
linear /dev/vde 0"

root@vc-00-00-1075-dev:~# mount -o noatime,sync /dev/mapper/VDE
/export/v122_nas/

copy some files to the mountpoint, then:

dmsetup reload VDE --table "0 41943040 error"
dmsetup resume VDE

umount /export/v122_nas

this never returns and /proc shows:
root@vc-00-00-1075-dev:~# cat /proc/2684/stack
[<ffffffffa033ac6a>] xfs_ail_push_all_sync+0x9a/0xd0 [xfs]
[<ffffffffa0330123>] xfs_unmountfs+0x63/0x160 [xfs]
[<ffffffffa02ee265>] xfs_fs_put_super+0x25/0x60 [xfs]
[<ffffffff8118fd12>] generic_shutdown_super+0x62/0xf0
[<ffffffff8118fdd0>] kill_block_super+0x30/0x80
[<ffffffff811903dc>] deactivate_locked_super+0x3c/0x90
[<ffffffff81190d7e>] deactivate_super+0x4e/0x70
[<ffffffff811ad086>] mntput_no_expire+0x106/0x160
[<ffffffff811ae760>] sys_umount+0xa0/0xe0
[<ffffffff816ab919>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

And after some time, hung task warning shows:
INFO: task kworker/2:1:39 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1     D ffffffff8180cf00     0    39      2 0x00000000
ffff88007c54db38 0000000000000046 000000027d003700 ffff88007fd03fc0
ffff88007c54dfd8 ffff88007c54dfd8 ffff88007c54dfd8 0000000000013e40
ffff88007c9e9710 ffff88007c4bdc40 00000000000000b8 7fffffffffffffff
Call Trace:
[<ffffffff816a1b99>] schedule+0x29/0x70
[<ffffffff816a02d5>] schedule_timeout+0x1e5/0x250
[<ffffffffa02f3987>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
[<ffffffff816798e6>] ? kmemleak_alloc+0x26/0x50
[<ffffffff816a0f1b>] __down_common+0xa0/0xf0
[<ffffffffa032f37c>] ? xfs_getsb+0x3c/0x70 [xfs]
[<ffffffff816a0fde>] __down+0x1d/0x1f
[<ffffffff81084591>] down+0x41/0x50
[<ffffffffa02dcd44>] xfs_buf_lock+0x44/0x110 [xfs]
[<ffffffffa032f37c>] xfs_getsb+0x3c/0x70 [xfs]
[<ffffffffa033b4bc>] xfs_trans_getsb+0x4c/0x140 [xfs]
[<ffffffffa032f06e>] xfs_mod_sb+0x4e/0xc0 [xfs]
[<ffffffffa02e3b24>] xfs_fs_log_dummy+0x54/0x90 [xfs]
[<ffffffffa0335bf8>] xfs_log_worker+0x48/0x50 [xfs]
[<ffffffff81077a11>] process_one_work+0x141/0x4a0
[<ffffffff810789e8>] worker_thread+0x168/0x410
[<ffffffff81078880>] ? manage_workers+0x120/0x120
[<ffffffff8107df10>] kthread+0xc0/0xd0
[<ffffffff813a3ea4>] ? acpi_get_child+0x47/0x4d
[<ffffffff813a3fb7>] ? acpi_platform_notify.part.0+0xbb/0xda
[<ffffffff8107de50>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff816ab86c>] ret_from_fork+0x7c/0xb0
[<ffffffff8107de50>] ? flush_kthread_worker+0xb0/0xb0

[3] contains some of information that wiki asks for, right before
doing the unmount in the original test. I have also re-ran the
original test with slub_debug=FZ, but it did not show up any warnings
(perhaps I am missing something on how to enable the red-zone
debugging).

Thanks,
Alex.

[3]
root@vc-00-00-1075-dev:/mnt/work/alex# uname -a
Linux vc-00-00-1075-dev 3.8.13-557-generic #1382000791 SMP Thu Oct 17
11:22:20 IST 2013 x86_64 x86_64 x86_64 GNU/Linux

this is a manually-compiled 3.8.13 kernel, in which:
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
but no code changes.

root@vc-00-00-1075-dev:/mnt/work/alex# xfs_repair -V
xfs_repair version 3.1.7

number of CPUs: 4

root@vc-00-00-1075-dev:/mnt/work/alex# cat /proc/meminfo
MemTotal:        2050940 kB
MemFree:          996376 kB
Buffers:           32836 kB
Cached:           389208 kB
SwapCached:            0 kB
Active:           490232 kB
Inactive:         239988 kB
Active(anon):     311336 kB
Inactive(anon):     1336 kB
Active(file):     178896 kB
Inactive(file):   238652 kB
Unevictable:        4632 kB
Mlocked:            4632 kB
SwapTotal:        522236 kB
SwapFree:         522236 kB
Dirty:                96 kB
Writeback:             0 kB
AnonPages:        312920 kB
Mapped:            46656 kB
Shmem:              1456 kB
Slab:             224640 kB
SReclaimable:      30040 kB
SUnreclaim:       194600 kB
KernelStack:        3632 kB
PageTables:        14584 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1547704 kB
Committed_AS:    1667900 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       30860 kB
VmallocChunk:   34359648832 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      153588 kB
DirectMap2M:     1943552 kB

root@vc-00-00-1075-dev:/mnt/work/alex# cat /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=1016840k,nr_inodes=254210,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 
0 0
tmpfs /run tmpfs rw,nosuid,relatime,size=410188k,mode=755 0 0
/dev/disk/by-uuid/c2a6ff28-cf03-4785-9ac8-60dd4ed7d3af / ext4
rw,relatime,errors=remount-ro,data=ordered 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=10256k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
172.16.0.12:/export2/work /mnt/work nfs4
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.16.0.35,local_lock=none,addr=172.16.0.12
0 0
172.16.0.12:/export2/share /mnt/share nfs4
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.16.0.35,local_lock=none,addr=172.16.0.12
0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
cgroup_root /sys/fs/cgroup tmpfs rw,relatime 0 0
blkio /sys/fs/cgroup/blkio cgroup
rw,relatime,perf_event,blkio,freezer,devices,cpuacct,cpu,cpuset 0 0
/dev/mapper/vpart-1 /mnt/bootdir ext4 rw,relatime,(null),debug,data=ordered 
0 0
/dev/dm-20 /btrfs/pool-00000007 btrfs
rw,noatime,nodatasum,nodatacow,notreelog,nospace_cache,clear_cache 0 0
/dev/dm-22 /export/v122_nas xfs rw,sync,noatime,wsync,attr2,inode64,noquota 
0 0

root@vc-00-00-1075-dev:/mnt/work/alex# cat /proc/partitions
major minor  #blocks  name

254        0   52428800 vda
254       16   52428800 vdb
254       32   20971520 vdc
254       48   20971520 vdd
254       64   20971520 vde
254       80     512000 vdf
254       96     512000 vdg
254      112     512000 vdh
254      128     512000 vdi
254      144     512000 vdj
254      160    1048576 vdk
254      176    1048576 vdl
254      192     312320 vdm
254      208     312320 vdn
  8        0    5242880 sda
  8        1    4717568 sda1
  8        2          1 sda2
  8        5     522240 sda5
 11        0    1048575 sr0
253        0 140737488355327 dm-0
253        1     512000 dm-1
253        2     512000 dm-2
253        3     512000 dm-3
  9        1     511988 md1
253        4     511988 dm-4
253        5    1048576 dm-5
253        7     312320 dm-7
253        6   20971520 dm-6
253        8   52428800 dm-8
253        9   20971520 dm-9
253       10    1048576 dm-10
253       11     312320 dm-11
253       12   52428800 dm-12
  9        5   18872320 md5
  9        2    1048564 md2
  9        3     312308 md3
  9        4   50329600 md4
253       13   18872320 dm-13
253       14     312308 dm-14
253       15    1048564 dm-15
253       16   50329600 dm-16
253       17   49283072 dm-17
253       18   49283072 dm-18
253       19   49283072 dm-19
253       20   49283072 dm-20
253       21  127926272 dm-21
253       22  127926272 dm-22

root@vc-00-00-1075-dev:/mnt/work/alex# xfs_info /export/v122_nas/
meta-data=/dev/dm-22             isize=256    agcount=4, agsize=7995392 blks
        =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=31981568, imaxpct=25
        =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=15616, version=2
        =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

On Fri, Nov 22, 2013 at 12:07 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
[cc'd the correct xfs list. PLease use xfs@xxxxxxxxxxx in future.]

On Thu, Nov 21, 2013 at 08:04:36PM +0200, Alex Lyakas wrote:
Greetings,
I am using stock XFS from kernel 3.8.13, compiled with kmemleak
enabled. I am testing a particular scenario, in which the underlying
block device returns IO errors during XFS umount. Almost in all cases
this results in kernel crashes in various places, and sometimes
kmemleak complains, and sometimes CPU soft lockup happens. It always
happens after XFS messages like:

What testing are you doing?

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

 kernel: [  600.190509] XFS (dm-22): metadata I/O error: block
0x7600030 ("xlog_iodone") error 125 numblks 64
 kernel: [  600.192267] XFS (dm-22): xfs_do_force_shutdown(0x2) called
from line 1115 of file
/mnt/compile/linux-stable/source/fs/xfs/xfs_log.c.  Return address =
0xffffffffa05cffa1
 kernel: [  600.192319] XFS (dm-22): Log I/O Error Detected.  Shutting
down filesystem
 kernel: [  600.192392] XFS (dm-22): Unable to update superblock
counters. Freespace may not be correct on next mount.
 kernel: [  600.192398] XFS (dm-22): xfs_log_force: error 5 returned.
 kernel: [  600.193687] XFS (声.Z): Please umount the filesystem and
rectify the problem(s)

you can see here the garbage that XFS prints instead of the block device 
name.
In [1] and [2] I am attaching more kernel log from two such crashes.

So, something is corrupting memory and stamping all over the XFS
structures.

What's error 125?

#define ECANCELED       125     /* Operation Canceled */

I can't find an driver that actually returns that error to
filesystems, which....

 kernel: [  600.227881] Modules linked in: xfs raid1 xfrm_user
xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 scst_vdisk(O)
iscsi_scst(O) scst(O) dm_zcache(O) dm_btrfs(O) btrfs(O) libcrc32c
dm_iostat(O)

.... given you have a bunch of out of tree modules loaded (and some which
are experiemental) suggests that you have a problem with your
storage...

So, something is corrupting memory across a large number of
subsystems, and the trigger is some custom code to run error
injection. Can you reproduce the problem with something like
dm-faulty?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs