Hi Dave,
any insight on this issue? At least on the simpler reproduction with "error"
DeviceMapper?
Thanks,
Alex.
-----Original Message-----
From: Alex Lyakas
Sent: 24 November, 2013 12:27 PM
To: Dave Chinner ; xfs@xxxxxxxxxxx
Cc: linux-xfs@xxxxxxxxxxxxxxx
Subject: Re: XFS umount with IO errors seems to lead to memory corruption
Hi Dave,
thank you for your comments.
The test that I am doing is unmounting the XFS, while its underlying
block device returns intermittent IO errors. The block device in
question is a custom DeviceMapper target. It returns -ECANCELED in
this case. Should I return some other errno instead?
The same exact test works alright with ext4. It's unmount finishes,
system seems to continue functioning normally and kmemleak is also
happy.
When doing a simpler reproductoin with "error" Device-Mapper, umount
gets stuck and never returns, while kernel keeps printing:
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
XFS (dm-0): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks")
error 5 numblks 1
...
What is the expected behavior of umount in this case? (when there are
IO errors during it).
This reproduction that I do is:
root@vc-00-00-1075-dev:~# cat /etc/xfs.protofile
dummy : bootfilename, not used, backward compatibility
0 0 : numbers of blocks and inodes, not
used, backward compatibility
d--777 0 0 : set 777 perms for the root dir
$
$
root@vc-00-00-1075-dev:~# mkfs.xfs -f /dev/vde -p /etc/xfs.protofile
meta-data=/dev/vde isize=256 agcount=4, agsize=1310720 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=5242880, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
root@vc-00-00-1075-dev:~# blockdev --getsz /dev/vde
41943040
root@vc-00-00-1075-dev:~# dmsetup create VDE --table "0 41943040
linear /dev/vde 0"
root@vc-00-00-1075-dev:~# mount -o noatime,sync /dev/mapper/VDE
/export/v122_nas/
copy some files to the mountpoint, then:
dmsetup reload VDE --table "0 41943040 error"
dmsetup resume VDE
umount /export/v122_nas
this never returns and /proc shows:
root@vc-00-00-1075-dev:~# cat /proc/2684/stack
[<ffffffffa033ac6a>] xfs_ail_push_all_sync+0x9a/0xd0 [xfs]
[<ffffffffa0330123>] xfs_unmountfs+0x63/0x160 [xfs]
[<ffffffffa02ee265>] xfs_fs_put_super+0x25/0x60 [xfs]
[<ffffffff8118fd12>] generic_shutdown_super+0x62/0xf0
[<ffffffff8118fdd0>] kill_block_super+0x30/0x80
[<ffffffff811903dc>] deactivate_locked_super+0x3c/0x90
[<ffffffff81190d7e>] deactivate_super+0x4e/0x70
[<ffffffff811ad086>] mntput_no_expire+0x106/0x160
[<ffffffff811ae760>] sys_umount+0xa0/0xe0
[<ffffffff816ab919>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
And after some time, hung task warning shows:
INFO: task kworker/2:1:39 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/2:1 D ffffffff8180cf00 0 39 2 0x00000000
ffff88007c54db38 0000000000000046 000000027d003700 ffff88007fd03fc0
ffff88007c54dfd8 ffff88007c54dfd8 ffff88007c54dfd8 0000000000013e40
ffff88007c9e9710 ffff88007c4bdc40 00000000000000b8 7fffffffffffffff
Call Trace:
[<ffffffff816a1b99>] schedule+0x29/0x70
[<ffffffff816a02d5>] schedule_timeout+0x1e5/0x250
[<ffffffffa02f3987>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
[<ffffffff816798e6>] ? kmemleak_alloc+0x26/0x50
[<ffffffff816a0f1b>] __down_common+0xa0/0xf0
[<ffffffffa032f37c>] ? xfs_getsb+0x3c/0x70 [xfs]
[<ffffffff816a0fde>] __down+0x1d/0x1f
[<ffffffff81084591>] down+0x41/0x50
[<ffffffffa02dcd44>] xfs_buf_lock+0x44/0x110 [xfs]
[<ffffffffa032f37c>] xfs_getsb+0x3c/0x70 [xfs]
[<ffffffffa033b4bc>] xfs_trans_getsb+0x4c/0x140 [xfs]
[<ffffffffa032f06e>] xfs_mod_sb+0x4e/0xc0 [xfs]
[<ffffffffa02e3b24>] xfs_fs_log_dummy+0x54/0x90 [xfs]
[<ffffffffa0335bf8>] xfs_log_worker+0x48/0x50 [xfs]
[<ffffffff81077a11>] process_one_work+0x141/0x4a0
[<ffffffff810789e8>] worker_thread+0x168/0x410
[<ffffffff81078880>] ? manage_workers+0x120/0x120
[<ffffffff8107df10>] kthread+0xc0/0xd0
[<ffffffff813a3ea4>] ? acpi_get_child+0x47/0x4d
[<ffffffff813a3fb7>] ? acpi_platform_notify.part.0+0xbb/0xda
[<ffffffff8107de50>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff816ab86c>] ret_from_fork+0x7c/0xb0
[<ffffffff8107de50>] ? flush_kthread_worker+0xb0/0xb0
[3] contains some of information that wiki asks for, right before
doing the unmount in the original test. I have also re-ran the
original test with slub_debug=FZ, but it did not show up any warnings
(perhaps I am missing something on how to enable the red-zone
debugging).
Thanks,
Alex.
[3]
root@vc-00-00-1075-dev:/mnt/work/alex# uname -a
Linux vc-00-00-1075-dev 3.8.13-557-generic #1382000791 SMP Thu Oct 17
11:22:20 IST 2013 x86_64 x86_64 x86_64 GNU/Linux
this is a manually-compiled 3.8.13 kernel, in which:
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
but no code changes.
root@vc-00-00-1075-dev:/mnt/work/alex# xfs_repair -V
xfs_repair version 3.1.7
number of CPUs: 4
root@vc-00-00-1075-dev:/mnt/work/alex# cat /proc/meminfo
MemTotal: 2050940 kB
MemFree: 996376 kB
Buffers: 32836 kB
Cached: 389208 kB
SwapCached: 0 kB
Active: 490232 kB
Inactive: 239988 kB
Active(anon): 311336 kB
Inactive(anon): 1336 kB
Active(file): 178896 kB
Inactive(file): 238652 kB
Unevictable: 4632 kB
Mlocked: 4632 kB
SwapTotal: 522236 kB
SwapFree: 522236 kB
Dirty: 96 kB
Writeback: 0 kB
AnonPages: 312920 kB
Mapped: 46656 kB
Shmem: 1456 kB
Slab: 224640 kB
SReclaimable: 30040 kB
SUnreclaim: 194600 kB
KernelStack: 3632 kB
PageTables: 14584 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1547704 kB
Committed_AS: 1667900 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 30860 kB
VmallocChunk: 34359648832 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 153588 kB
DirectMap2M: 1943552 kB
root@vc-00-00-1075-dev:/mnt/work/alex# cat /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=1016840k,nr_inodes=254210,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000
0 0
tmpfs /run tmpfs rw,nosuid,relatime,size=410188k,mode=755 0 0
/dev/disk/by-uuid/c2a6ff28-cf03-4785-9ac8-60dd4ed7d3af / ext4
rw,relatime,errors=remount-ro,data=ordered 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=10256k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
172.16.0.12:/export2/work /mnt/work nfs4
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.16.0.35,local_lock=none,addr=172.16.0.12
0 0
172.16.0.12:/export2/share /mnt/share nfs4
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.16.0.35,local_lock=none,addr=172.16.0.12
0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
cgroup_root /sys/fs/cgroup tmpfs rw,relatime 0 0
blkio /sys/fs/cgroup/blkio cgroup
rw,relatime,perf_event,blkio,freezer,devices,cpuacct,cpu,cpuset 0 0
/dev/mapper/vpart-1 /mnt/bootdir ext4 rw,relatime,(null),debug,data=ordered
0 0
/dev/dm-20 /btrfs/pool-00000007 btrfs
rw,noatime,nodatasum,nodatacow,notreelog,nospace_cache,clear_cache 0 0
/dev/dm-22 /export/v122_nas xfs rw,sync,noatime,wsync,attr2,inode64,noquota
0 0
root@vc-00-00-1075-dev:/mnt/work/alex# cat /proc/partitions
major minor #blocks name
254 0 52428800 vda
254 16 52428800 vdb
254 32 20971520 vdc
254 48 20971520 vdd
254 64 20971520 vde
254 80 512000 vdf
254 96 512000 vdg
254 112 512000 vdh
254 128 512000 vdi
254 144 512000 vdj
254 160 1048576 vdk
254 176 1048576 vdl
254 192 312320 vdm
254 208 312320 vdn
8 0 5242880 sda
8 1 4717568 sda1
8 2 1 sda2
8 5 522240 sda5
11 0 1048575 sr0
253 0 140737488355327 dm-0
253 1 512000 dm-1
253 2 512000 dm-2
253 3 512000 dm-3
9 1 511988 md1
253 4 511988 dm-4
253 5 1048576 dm-5
253 7 312320 dm-7
253 6 20971520 dm-6
253 8 52428800 dm-8
253 9 20971520 dm-9
253 10 1048576 dm-10
253 11 312320 dm-11
253 12 52428800 dm-12
9 5 18872320 md5
9 2 1048564 md2
9 3 312308 md3
9 4 50329600 md4
253 13 18872320 dm-13
253 14 312308 dm-14
253 15 1048564 dm-15
253 16 50329600 dm-16
253 17 49283072 dm-17
253 18 49283072 dm-18
253 19 49283072 dm-19
253 20 49283072 dm-20
253 21 127926272 dm-21
253 22 127926272 dm-22
root@vc-00-00-1075-dev:/mnt/work/alex# xfs_info /export/v122_nas/
meta-data=/dev/dm-22 isize=256 agcount=4, agsize=7995392 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=31981568, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=15616, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
On Fri, Nov 22, 2013 at 12:07 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
[cc'd the correct xfs list. PLease use xfs@xxxxxxxxxxx in future.]
On Thu, Nov 21, 2013 at 08:04:36PM +0200, Alex Lyakas wrote:
Greetings,
I am using stock XFS from kernel 3.8.13, compiled with kmemleak
enabled. I am testing a particular scenario, in which the underlying
block device returns IO errors during XFS umount. Almost in all cases
this results in kernel crashes in various places, and sometimes
kmemleak complains, and sometimes CPU soft lockup happens. It always
happens after XFS messages like:
What testing are you doing?
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
kernel: [ 600.190509] XFS (dm-22): metadata I/O error: block
0x7600030 ("xlog_iodone") error 125 numblks 64
kernel: [ 600.192267] XFS (dm-22): xfs_do_force_shutdown(0x2) called
from line 1115 of file
/mnt/compile/linux-stable/source/fs/xfs/xfs_log.c. Return address =
0xffffffffa05cffa1
kernel: [ 600.192319] XFS (dm-22): Log I/O Error Detected. Shutting
down filesystem
kernel: [ 600.192392] XFS (dm-22): Unable to update superblock
counters. Freespace may not be correct on next mount.
kernel: [ 600.192398] XFS (dm-22): xfs_log_force: error 5 returned.
kernel: [ 600.193687] XFS (声.Z): Please umount the filesystem and
rectify the problem(s)
you can see here the garbage that XFS prints instead of the block device
name.
In [1] and [2] I am attaching more kernel log from two such crashes.
So, something is corrupting memory and stamping all over the XFS
structures.
What's error 125?
#define ECANCELED 125 /* Operation Canceled */
I can't find an driver that actually returns that error to
filesystems, which....
kernel: [ 600.227881] Modules linked in: xfs raid1 xfrm_user
xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 scst_vdisk(O)
iscsi_scst(O) scst(O) dm_zcache(O) dm_btrfs(O) btrfs(O) libcrc32c
dm_iostat(O)
.... given you have a bunch of out of tree modules loaded (and some which
are experiemental) suggests that you have a problem with your
storage...
So, something is corrupting memory across a large number of
subsystems, and the trigger is some custom code to run error
injection. Can you reproduce the problem with something like
dm-faulty?
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs