On 06/11/2015 06:58 PM, Eric Sandeen wrote: > On 6/11/15 10:51 AM, Eric Sandeen wrote: >> On 6/11/15 10:28 AM, Török Edwin wrote: >>> On 06/11/2015 06:16 PM, Brian Foster wrote: >>>> On Thu, Jun 11, 2015 at 09:23:38AM +0300, Török Edwin wrote: >>>>> [1.] XFS on ARM corruption 'Structure needs cleaning' >>>>> [2.] Full description of the problem/report: >>>>> >>>>> I have been running XFS sucessfully on x86-64 for years, however I'm having trouble running it on ARM. >>>>> >>>>> Running the testcase below [7.] reliably reproduces the filesystem corruption starting from a freshly >>>>> created XFS filesystem: running ls after 'sxadm node --new --batch /export/dfs/a/b' shows a 'Structure needs cleaning' error, >>>>> and dmesg shows a corruption error [6.]. >>>>> xfs_repair 3.1.9 is not able to repair the corruption: after mounting the repair filesystem >>>>> I still get the 'Structure needs cleaning' error. >>>>> >>>>> Note: using /export/dfs/a/b is important for reproducing the problem: if I only use one level of directories in /export/dfs then the problem >>>>> doesn't reproduce. Also if I use a tuned version of sxadm that creates fewer database files then the problem doesn't reproduce either. >>>>> >>>>> [3.] Keywords: filesystems, XFS corruption, ARM >>>>> [4.] Kernel information >>>>> [4.1.] Kernel version (from /proc/version): >>>>> Linux hornet34 3.14.3-00088-g7651c68 #24 Thu Apr 9 16:13:46 MDT 2015 armv7l GNU/Linux >>>>> >>>> ... >>>>> [5.] Most recent kernel version which did not have the bug: Unknown, first kernel I try on ARM >>>>> >>>>> [6.] dmesg stacktrace >>>>> >>>>> [4627578.440000] XFS (sda4): Mounting Filesystem >>>>> [4627578.510000] XFS (sda4): Ending clean mount >>>>> [4627621.470000] dd6ee000: 58 46 53 42 00 00 10 00 00 00 00 00 37 40 21 00 XFSB........7@!. >>>>> [4627621.480000] dd6ee010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>>>> [4627621.490000] dd6ee020: 5b 08 7f 79 0e 3a 46 3d 9b ea 26 ad 9d 62 17 8d [..y.:F=..&..b.. >>>>> [4627621.490000] dd6ee030: 00 00 00 00 20 00 00 04 00 00 00 00 00 00 00 80 .... ........... >>>> >>>> Just a data point... the magic number here looks like a superblock magic >>>> (XFSB) rather than one of the directory magic numbers. I'm wondering if >>>> a buffer disk address has gone bad somehow or another. >>>> >>>> Does this happen to be a large block device? I don't see any partition >>>> or xfs_info data below. If so, it would be interesting to see if this >>>> reproduces on a smaller device. It does appear that the large block >>>> device option is enabled in the kernel config above, however, so maybe >>>> that's unrelated. >>> >>> This is mkfs.xfs /dev/sda4: >>> meta-data=/dev/sda4 isize=256 agcount=4, agsize=231737408 blks >>> = sectsz=512 attr=2, projid32bit=0 >>> data = bsize=4096 blocks=926949632, imaxpct=5 >>> = sunit=0 swidth=0 blks >>> naming =version 2 bsize=4096 ascii-ci=0 >>> log =internal log bsize=4096 blocks=452612, version=2 >>> = sectsz=512 sunit=0 blks, lazy-count=1 >>> realtime =none extsz=4096 blocks=0, rtextents=0 >>> >>> But it also reproduces with this small loopback file: >>> meta-data=/tmp/xfs.test isize=256 agcount=2, agsize=5120 blks >>> = sectsz=512 attr=2, projid32bit=0 >>> data = bsize=4096 blocks=10240, imaxpct=25 >>> = sunit=0 swidth=0 blks >>> naming =version 2 bsize=4096 ascii-ci=0 >>> log =internal log bsize=4096 blocks=1200, version=2 >>> = sectsz=512 sunit=0 blks, lazy-count=1 >>> realtime =none extsz=4096 blocks=0, rtextents=0 >> >> ok so not a block number overflow issue, thanks. >> >>> You can have a look at xfs.test here: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs.test.gz >>> >>> If I loopback mount that on an x86-64 box it doesn't show the corruption message though ... >> >> FWIW, this is the 2nd report we've had of something similar, both on Armv7, both ok on x86_64. >> >> I'll take a look at your xfs.test; that's presumably copied after it reported the error, and you unmounted it before uploading, correct? And it was mkfs'd on armv7, never mounted or manipulated in any way on x86_64? Thanks, yes it was mkfs.xfs on ARMv7 and unmounted. > > Oh, and what were the kernel messages when you produced the corruption with xfs.txt? Takes only a couple of minutes to reproduce the issue so I've prepared a fresh set of xfs2.test and corresponding kernel messages to make sure its all consistent. Freshly created XFS by mkfs.xfs: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs2.test.orig.gz The corrupted XFS: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs2.test.corrupted.gz All commands below were run on armv7, and unmounted, the files from /tmp copied over to x86-64, gzipped and uploaded, they were never mounted on x86-64: # dd if=/dev/zero of=/tmp/xfs2.test bs=1M count=40 40+0 records in 40+0 records out 41943040 bytes (42 MB) copied, 0.419997 s, 99.9 MB/s # mkfs.xfs /tmp/xfs2.test meta-data=/tmp/xfs2.test isize=256 agcount=2, agsize=5120 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=10240, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=1200, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # cp /tmp/xfs2.test /tmp/xfs2.test.orig # umount /export/dfs # mount -o loop -t xfs /tmp/xfs2.test /export/dfs # mkdir /export/dfs/a # sxadm node --new --batch /export/dfs/a/b # ls /export/dfs/a/b ls: reading directory /export/dfs/a/b: Structure needs cleaning # umount /export/dfs # cp /tmp/xfs2.test /tmp/xfs2.test.corrupted # dmesg >/tmp/dmesg # exit the latest corruption message from dmesg: [4744604.870000] XFS (loop0): Mounting Filesystem [4744604.900000] XFS (loop0): Ending clean mount [4745016.610000] dc61e000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 28 00 XFSB..........(. [4745016.620000] dc61e010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [4745016.630000] dc61e020: 64 23 d2 06 32 2e 4c 20 82 6e f0 36 a7 d9 54 f9 d#..2.L .n.6..T. [4745016.640000] dc61e030: 00 00 00 00 00 00 20 04 00 00 00 00 00 00 00 80 ...... ......... [4745016.640000] XFS (loop0): Internal error xfs_dir3_data_read_verify at line 274 of file fs/xfs/xfs_dir2_data.c. Caller 0xc01c1528 [4745016.650000] CPU: 0 PID: 37 Comm: kworker/0:1H Not tainted 3.14.3-00088-g7651c68 #24 [4745016.650000] Workqueue: xfslogd xfs_buf_iodone_work [4745016.650000] [<c0013948>] (unwind_backtrace) from [<c0011058>] (show_stack+0x10/0x14) [4745016.650000] [<c0011058>] (show_stack) from [<c01c3dc4>] (xfs_corruption_error+0x54/0x70) [4745016.650000] [<c01c3dc4>] (xfs_corruption_error) from [<c01f7854>] (xfs_dir3_data_read_verify+0x60/0xd0) [4745016.650000] [<c01f7854>] (xfs_dir3_data_read_verify) from [<c01c1528>] (xfs_buf_iodone_work+0x7c/0x94) [4745016.650000] [<c01c1528>] (xfs_buf_iodone_work) from [<c00309f0>] (process_one_work+0xf4/0x32c) [4745016.650000] [<c00309f0>] (process_one_work) from [<c0030fb4>] (worker_thread+0x10c/0x388) [4745016.650000] [<c0030fb4>] (worker_thread) from [<c0035e10>] (kthread+0xbc/0xd8) [4745016.650000] [<c0035e10>] (kthread) from [<c000e8f8>] (ret_from_fork+0x14/0x3c) [4745016.650000] XFS (loop0): Corruption detected. Unmount and run xfs_repair [4745016.650000] XFS (loop0): metadata I/O error: block 0xa000 ("xfs_trans_read_buf_map") error 117 numblks 8 Best regards, --Edwin _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs