On 06/12/2015 03:21 PM, Brian Foster wrote: > On Thu, Jun 11, 2015 at 07:32:04PM +0300, Török Edwin wrote: >> On 06/11/2015 06:58 PM, Eric Sandeen wrote: >>> On 6/11/15 10:51 AM, Eric Sandeen wrote: >>>> On 6/11/15 10:28 AM, Török Edwin wrote: >>>>> On 06/11/2015 06:16 PM, Brian Foster wrote: >>>>>> On Thu, Jun 11, 2015 at 09:23:38AM +0300, Török Edwin wrote: >>>>>>> [1.] XFS on ARM corruption 'Structure needs cleaning' >>>>>>> [2.] Full description of the problem/report: >>>>>>> >>>>>>> I have been running XFS sucessfully on x86-64 for years, however I'm having trouble running it on ARM. >>>>>>> >>>>>>> Running the testcase below [7.] reliably reproduces the filesystem corruption starting from a freshly >>>>>>> created XFS filesystem: running ls after 'sxadm node --new --batch /export/dfs/a/b' shows a 'Structure needs cleaning' error, >>>>>>> and dmesg shows a corruption error [6.]. >>>>>>> xfs_repair 3.1.9 is not able to repair the corruption: after mounting the repair filesystem >>>>>>> I still get the 'Structure needs cleaning' error. >>>>>>> >>>>>>> Note: using /export/dfs/a/b is important for reproducing the problem: if I only use one level of directories in /export/dfs then the problem >>>>>>> doesn't reproduce. Also if I use a tuned version of sxadm that creates fewer database files then the problem doesn't reproduce either. >>>>>>> >>>>>>> [3.] Keywords: filesystems, XFS corruption, ARM >>>>>>> [4.] Kernel information >>>>>>> [4.1.] Kernel version (from /proc/version): >>>>>>> Linux hornet34 3.14.3-00088-g7651c68 #24 Thu Apr 9 16:13:46 MDT 2015 armv7l GNU/Linux >>>>>>> >>>>>> ... >>>>>>> [5.] Most recent kernel version which did not have the bug: Unknown, first kernel I try on ARM >>>>>>> >>>>>>> [6.] dmesg stacktrace >>>>>>> >>>>>>> [4627578.440000] XFS (sda4): Mounting Filesystem >>>>>>> [4627578.510000] XFS (sda4): Ending clean mount >>>>>>> [4627621.470000] dd6ee000: 58 46 53 42 00 00 10 00 00 00 00 00 37 40 21 00 XFSB........7@!. >>>>>>> [4627621.480000] dd6ee010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>>>>>> [4627621.490000] dd6ee020: 5b 08 7f 79 0e 3a 46 3d 9b ea 26 ad 9d 62 17 8d [..y.:F=..&..b.. >>>>>>> [4627621.490000] dd6ee030: 00 00 00 00 20 00 00 04 00 00 00 00 00 00 00 80 .... ........... >>>>>> >>>>>> Just a data point... the magic number here looks like a superblock magic >>>>>> (XFSB) rather than one of the directory magic numbers. I'm wondering if >>>>>> a buffer disk address has gone bad somehow or another. >>>>>> >>>>>> Does this happen to be a large block device? I don't see any partition >>>>>> or xfs_info data below. If so, it would be interesting to see if this >>>>>> reproduces on a smaller device. It does appear that the large block >>>>>> device option is enabled in the kernel config above, however, so maybe >>>>>> that's unrelated. >>>>> >>>>> This is mkfs.xfs /dev/sda4: >>>>> meta-data=/dev/sda4 isize=256 agcount=4, agsize=231737408 blks >>>>> = sectsz=512 attr=2, projid32bit=0 >>>>> data = bsize=4096 blocks=926949632, imaxpct=5 >>>>> = sunit=0 swidth=0 blks >>>>> naming =version 2 bsize=4096 ascii-ci=0 >>>>> log =internal log bsize=4096 blocks=452612, version=2 >>>>> = sectsz=512 sunit=0 blks, lazy-count=1 >>>>> realtime =none extsz=4096 blocks=0, rtextents=0 >>>>> >>>>> But it also reproduces with this small loopback file: >>>>> meta-data=/tmp/xfs.test isize=256 agcount=2, agsize=5120 blks >>>>> = sectsz=512 attr=2, projid32bit=0 >>>>> data = bsize=4096 blocks=10240, imaxpct=25 >>>>> = sunit=0 swidth=0 blks >>>>> naming =version 2 bsize=4096 ascii-ci=0 >>>>> log =internal log bsize=4096 blocks=1200, version=2 >>>>> = sectsz=512 sunit=0 blks, lazy-count=1 >>>>> realtime =none extsz=4096 blocks=0, rtextents=0 >>>> >>>> ok so not a block number overflow issue, thanks. >>>> >>>>> You can have a look at xfs.test here: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs.test.gz >>>>> >>>>> If I loopback mount that on an x86-64 box it doesn't show the corruption message though ... >>>> >>>> FWIW, this is the 2nd report we've had of something similar, both on Armv7, both ok on x86_64. >>>> >>>> I'll take a look at your xfs.test; that's presumably copied after it reported the error, and you unmounted it before uploading, correct? And it was mkfs'd on armv7, never mounted or manipulated in any way on x86_64? >> >> Thanks, yes it was mkfs.xfs on ARMv7 and unmounted. >> >>> >>> Oh, and what were the kernel messages when you produced the corruption with xfs.txt? >> >> Takes only a couple of minutes to reproduce the issue so I've prepared a fresh set of xfs2.test and corresponding kernel messages to make sure its all consistent. >> Freshly created XFS by mkfs.xfs: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs2.test.orig.gz >> The corrupted XFS: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs2.test.corrupted.gz >> > > I managed to get an updated kernel on a beaglebone I had sitting around, > but I don't reproduce any errors with the "corrupted" image (I think > we've established that the image is fine on-disk and something is going > awry at runtime): > > root@beaglebone:~# uname -a > Linux beaglebone 3.14.1+ #5 SMP Thu Jun 11 20:58:02 EDT 2015 armv7l GNU/Linux > root@beaglebone:~# mount ./xfs2.test.corrupted /mnt/ > root@beaglebone:~# ls -al /mnt/a/ > total 12 > drwxr-xr-x 3 root root 14 Jun 11 16:11 . > drwxr-xr-x 3 root root 14 Jun 11 16:11 .. > drwxr-x--- 2 root root 8192 Jun 11 16:11 b > root@beaglebone:~# ls -al /mnt/a/b/ > total 17996 > drwxr-x--- 2 root root 8192 Jun 11 16:11 . > drwxr-xr-x 3 root root 14 Jun 11 16:11 .. > -rw-r--r-- 1 root root 12288 Jun 11 16:11 events.db > -rw-r--r-- 1 root root 15360 Jun 11 16:11 f00000000.db > -rw-r--r-- 1 root root 15360 Jun 11 16:11 f00000001.db > -rw-r--r-- 1 root root 15360 Jun 11 16:11 f00000002.db > -rw-r--r-- 1 root root 15360 Jun 11 16:11 f00000003.db > ... > root@beaglebone:~# > > I echo Dave's suggestion down thread with regard to toolchain. This > kernel was compiled with the following cross-gcc (installed via Fedora > package): > > gcc version 4.9.2 20150212 (Red Hat Cross 4.9.2-5) (GCC) > > Are you using something different? /proc/version says: Linux version 3.14.3-00088-g7651c68 (jenkins@boulder-jenkins) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #24 Thu Apr 9 16:13:46 MDT 2015 I'll get back to you when I have a new kernel running. Best regards, --Edwin _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs