Re: XFS File system in trouble

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 04, 2015 at 02:52:33AM -0500, Leslie Rhorer wrote:
> 	It's failing, again.  The rsync job failed and when I attempt to untar the
> file in the image mount, it fails there, as well.  See below.  I formatted a
> 1.5T drive as xfs and mounted it under /media.  I then dumped the failing FS
> to a file on /media using xfs_metadump and used xfs_mdrestore to create an
> image of the FS.  I then mounted the image, copied over the tarball to its
> location, and ran tar to extract the files:
> 

Ok, so is this a reliable reproducer? If so, does it reproduce on your
separate hardware? If so, can you share the (compressed) metadump
somewhere?

Brian

> RAID-Server:/# mount -o nouuid /media/md0.img /TEST
> 
> RAID-Server:/# cd "/TEST/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters/Rocket 2722/Driver"/
> 
> RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters/Rocket 2722/Driver# cp "/RAID/Server-Main/Equipment/Drive
> Controllers/HighPoint Adapters/Rocket 2722/Driver/RR_27xx.tar.gz" ./
> 
> RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters/Rocket 2722/Driver# tar -xzvf RR_27xx.tar.gz
> DC7280/
> DC7280/Linux/
> DC7280/Linux/Opensource/
> DC7280/Linux/Opensource/DC7280-linux-src-v1.0-110621-1313.tar.gz
> DC7280/Windows/
> DC7280/Windows/Vista-Win2008-Win7/
> DC7280/Windows/Vista-Win2008-Win7/x32/
> DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.cat
> DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.inf
> DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.sys
> DC7280/Windows/Vista-Win2008-Win7/x64/
> DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.cat
> DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.inf
> DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.sys
> DC7280/Windows/Vista-Win2008-Win7/Readme.txt
> DC7280/.ddinfo
> R272x/
> R272x/Linux/
> R272x/Linux/Opensource/
> R272x/Linux/Opensource/partial/
> R272x/Linux/Opensource/partial/include/
> 
> ...
> 
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/pcitable
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/readme.txt
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhdd
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step1.sh
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step2.sh
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/
> tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
> Cannot mkdir: Structure needs cleaning
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh
> tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
> Cannot mkdir: Input/output error
> tar:
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh:
> Cannot open: No such file or directory
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py
> tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
> Cannot mkdir: Input/output error
> tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py:
> Cannot open: No such file or directory
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo
> tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
> Cannot mkdir: Input/output error
> tar:
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo:
> Cannot open: No such file or directory
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias
> tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
> Cannot mkdir: Input/output error
> tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias:
> Cannot open: No such file or directory
> RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.cgz
> 
> gzip: tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64:
> Cannot mkdir: Input/output errorstdin: Input/output error
> 
> tar: Unexpected EOF in archive
> tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot utime: Input/output error
> tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change ownership to uid 0, gid
> 1000: Input/output error
> tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change mode to rwxr-xr-x:
> Input/output error
> tar: RR274x/Driver/Linux: Cannot utime: Input/output error
> tar: RR274x/Driver/Linux: Cannot change ownership to uid 0, gid 1000:
> Input/output error
> tar: RR274x/Driver/Linux: Cannot change mode to rwxr-xr-x: Input/output
> error
> tar: RR274x/Driver: Cannot utime: Input/output error
> tar: RR274x/Driver: Cannot change ownership to uid 0, gid 1000: Input/output
> error
> tar: RR274x/Driver: Cannot change mode to rwxr-xr-x: Input/output error
> tar: RR274x: Cannot utime: Input/output error
> tar: RR274x: Cannot change ownership to uid 0, gid 1000: Input/output error
> tar: RR274x: Cannot change mode to rwxr-xr-x: Input/output error
> tar: Error is not recoverable: exiting now
> 
> 
> dmesg:
> [131329.013475] XFS (md0): Mounting V4 Filesystem
> [131329.918438] XFS (md0): Ending clean mount
> [131499.357099] XFS (md0): Mounting V4 Filesystem
> [131499.709248] XFS (md0): Ending clean mount
> [131874.545344] loop: module loaded
> [131874.549914] XFS (loop0): Mounting V4 Filesystem
> [131874.555540] XFS (loop0): Ending clean mount
> [132020.964431] XFS (loop0): xfs_iread: validation failed for inode
> 124656869424 failed
> [132020.964435] ffff88028b078000: 49 4e 00 00 03 02 00 00 00 30 00 70 00 00
> 03 e8  IN.......0.p....
> [132020.964437] ffff88028b078010: 00 00 00 00 06 20 b0 6f 01 2e 00 00 00 00
> 00 16  ..... .o........
> [132020.964438] ffff88028b078020: 01 57 37 fd 2b 5d 22 9e 1e 0a 61 8c 00 00
> 00 20  .W7.+]"...a....
> [132020.964440] ffff88028b078030: ff ff 00 d2 1b f6 27 90 00 00 00 00 00 00
> 00 00  ......'.........
> [132020.964454] XFS (loop0): Internal error xfs_iread at line 392 of file
> /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_inode_buf.c. Caller
> xfs_iget+0x24b/0x690 [xfs]
> [132020.964457] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 #1
> Debian 3.16.7-ckt11-1
> [132020.964459] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
> [132020.964460]  0000000000000001 ffffffff8150b405 ffff880424059800
> ffffffffa09115cb
> [132020.964463]  0000018800000010 ffffffffa0916f6b ffff88030f5c6c00
> ffff880424059800
> [132020.964465]  0000000000000075 ffff8800ad1afe98 ffffffffa095cb3a
> ffffffffa0916f6b
> [132020.964467] Call Trace:
> [132020.964471]  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
> [132020.964478]  [<ffffffffa09115cb>] ? xfs_corruption_error+0x5b/0x80 [xfs]
> [132020.964483]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
> [132020.964492]  [<ffffffffa095cb3a>] ? xfs_iread+0xea/0x400 [xfs]
> [132020.964497]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
> [132020.964503]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
> [132020.964511]  [<ffffffffa0956de6>] ? xfs_ialloc+0xa6/0x500 [xfs]
> [132020.964517]  [<ffffffffa092658e>] ? kmem_zone_alloc+0x6e/0xe0 [xfs]
> [132020.964525]  [<ffffffffa09572a2>] ? xfs_dir_ialloc+0x62/0x2a0 [xfs]
> [132020.964531]  [<ffffffffa09251e5>] ? xfs_trans_reserve+0x1f5/0x200 [xfs]
> [132020.964538]  [<ffffffffa09579a9>] ? xfs_create+0x489/0x700 [xfs]
> [132020.964541]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
> [132020.964548]  [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
> [132020.964550]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
> [132020.964551]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
> [132020.964554]  [<ffffffff815115cd>] ?
> system_call_fast_compare_end+0x10/0x15
> [132020.964555] XFS (loop0): Corruption detected. Unmount and run xfs_repair
> [132020.964564] XFS (loop0): Internal error xfs_trans_cancel at line 959 of
> file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. Caller
> xfs_create+0x2b2/0x700 [xfs]
> [132020.964566] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 #1
> Debian 3.16.7-ckt11-1
> [132020.964567] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
> [132020.964568]  000000000000000c ffffffff8150b405 ffff8800ad1afe98
> ffffffffa0925e07
> [132020.964570]  ffff880002530800 ffff880079e03ec8 ffff880424059800
> ffffffffa09577d2
> [132020.964571]  0000000000000001 ffff880079e03e20 ffff880079e03e1c
> ffff880079e03eb0
> [132020.964573] Call Trace:
> [132020.964575]  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
> [132020.964581]  [<ffffffffa0925e07>] ? xfs_trans_cancel+0xc7/0xf0 [xfs]
> [132020.964588]  [<ffffffffa09577d2>] ? xfs_create+0x2b2/0x700 [xfs]
> [132020.964590]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
> [132020.964596]  [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
> [132020.964598]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
> [132020.964600]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
> [132020.964602]  [<ffffffff815115cd>] ?
> system_call_fast_compare_end+0x10/0x15
> [132020.964604] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 960
> of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. Return
> address = 0xffffffffa0925e20
> [132021.196487] XFS (loop0): Corruption of in-memory data detected. Shutting
> down filesystem
> [132021.196491] XFS (loop0): Please umount the filesystem and rectify the
> problem(s)
> [132024.791456] XFS (loop0): xfs_log_force: error 5 returned.
> [132054.854625] XFS (loop0): xfs_log_force: error 5 returned.
> [132084.917775] XFS (loop0): xfs_log_force: error 5 returned.
> [132114.980927] XFS (loop0): xfs_log_force: error 5 returned.
> [132145.044086] XFS (loop0): xfs_log_force: error 5 returned.
> [132175.107307] XFS (loop0): xfs_log_force: error 5 returned.
> [132205.170404] XFS (loop0): xfs_log_force: error 5 returned.
> [132235.233587] XFS (loop0): xfs_log_force: error 5 returned.
> 
> 
> On 8/2/2015 3:24 PM, Leslie Rhorer wrote:
> >
> >     OK, this is goofy.  It seems to be working, now.  As usual, I've
> >been doing some work on the server this weekend, but I can't think of
> >anything I have done that would fix the issue.  I did replace the
> >remaining good 4G RAM module with a pair of 8G RAM modules, but memtest
> >reported the remaining 4G module as good, and I verified the removed
> >module really was bad.  I also replaced the removable drive carrier and
> >cables that were feeding the two SSDs, once of which was reporting
> >failures as noted in the syslog.  It's hard for me to believe either of
> >those things could have been causing the issue, though.
> >
> >     I attached a 1.5T external drive to the server and formatted it as
> >XFS in preparation to continue troubleshooting.  To make sure of things,
> >I tried decompressing the tarball, again, and this time it worked all
> >the way to the end.  I then deleted the entire directory structure
> >created by the tarball and decompressed the file again twice.  I'll see
> >if the rsync process works.  That will take a couple of days.
> >
> >On 7/28/2015 5:11 PM, Brian Foster wrote:
> >>On Tue, Jul 28, 2015 at 10:13:01AM -0500, Leslie Rhorer wrote:
> >>>On 7/28/2015 7:33 AM, Brian Foster wrote:
> >>>>On Tue, Jul 28, 2015 at 02:46:45AM -0500, Leslie Rhorer wrote:
> >>>>>On 7/20/2015 6:17 AM, Brian Foster wrote:
> >>>>>>On Sat, Jul 18, 2015 at 08:02:50PM -0500, Leslie Rhorer wrote:
> >>>>>>>
> >>...
> >>>>
> >>>>>    I then copied both the tarball and the image over to the root,
> >>>>>and while
> >>>>>the system would not let me create the image on the root, it did
> >>>>>let me copy
> >>>>>the image to the root.  I then umounted the RAID array, mounted the
> >>>>>image,
> >>>>>and attempted to cd to the original directory in the image mount
> >>>>>where the
> >>>>>tarball was saved.  That failed with an I/O error:
> >>>>>
> >>>>
> >>>>It sounds a bit strange for the mdrestore to fail on root but a cp of
> >>>>the resulting image to work. Do the resulting images have the same file
> >>>>size or is the rootfs copy truncated? If the latter, you could be
> >>>>missing part of the fs and thus any of the following tests are probably
> >>>>moot.
> >>>
> >>>    Well, it can't be as large as it is reported, let's put it that way,
> >>>although the reported file size is the same.  Ls claims it to be 16T in
> >>>size, which cannot be the case on a 100G partition.  I forgot to
> >>>mention cp
> >>>does complain:
> >>>
> >>>RAID-Server:/# cp /RAID/TEST/RAIDfile.img ./
> >>>cp: cannot lseek ‘./RAIDfile.img’: Invalid argument
> >>>
> >>>    But it does the same thing on the backup server, and it works
> >>>there.  I
> >>>tried a cmp, and it seems to be hung.  It just may be taking a long
> >>>time,
> >>>however.
> >>>
> >>
> >>Yeah, you can't really trust the resulting image. It doesn't take much
> >>space to create a very large sparse file, but different filesystems have
> >>different maximum file size limits. The problem here is that some
> >>metadata near the beginning of the file might reference or depend on
> >>something near the end, and I/Os beyond the end of the file will
> >>probably result in errors.
> >>
> >>I'd probably try the nouuid approach since the hardware is similar as
> >>well as some of the other interesting suggestions that have been made to
> >>try and get the image on the rootfs and see what happens there too.
> >>
> >>Brian
> >>
> >>>>Brian
> >>>>
> >>>>>RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
> >>>>>Controllers/HighPoint
> >>>>>Adapters/Rocket 2722/Driver/"
> >>>>>bash: cd: /media/Server-Main/Equipment/Drive Controllers/HighPoint
> >>>>>Adapters/Rocket 2722/Driver/: Input/output error
> >>>>>
> >>>>>    I changed directories to a point two directories above the
> >>>>>previous attempt
> >>>>>and did a long listing:
> >>>>>
> >>>>>RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
> >>>>>Controllers/HighPoint
> >>>>>Adapters"
> >>>>>RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
> >>>>>Adapters# ll
> >>>>>ls: cannot access RocketRAID 2722: Input/output error
> >>>>>total 4
> >>>>>drwxr-xr-x 6 root lrhorer 4096 Jul 18 19:26 Rocket 2722
> >>>>>?????????? ? ?    ?          ?            ? RocketRAID 2722
> >>>>>
> >>>>>    As you can see, Rocket 2722 is still there, but RocketRAID 2722
> >>>>>is very
> >>>>>sick.  Rocket 2722 is the parent of where the tarbal was, however,
> >>>>>so I did
> >>>>>a cd and an ll again:
> >>>>>
> >>>>>RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
> >>>>>Adapters# cd "Rocket 2722"/
> >>>>>RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
> >>>>>Adapters/Rocket 2722# ll
> >>>>>ls: cannot access BIOS: Input/output error
> >>>>>ls: cannot access Driver: Input/output error
> >>>>>ls: cannot access HighPoint RAID Management Software: Input/output
> >>>>>error
> >>>>>ls: cannot access Manual: Input/output error
> >>>>>total 248
> >>>>>-rwxr--r-- 1 root lrhorer 245760 Nov 20  2008 autorun.exe
> >>>>>-rwxr--r-- 1 root lrhorer     51 Mar 21  2001 autorun.inf
> >>>>>?????????? ? ?    ?            ?            ? BIOS
> >>>>>?????????? ? ?    ?            ?            ? Driver
> >>>>>?????????? ? ?    ?            ?            ? HighPoint RAID
> >>>>>Management
> >>>>>Software
> >>>>>?????????? ? ?    ?            ?            ? Manual
> >>>>>-rwxr--r-- 1 root lrhorer   1134 Feb  5  2012 readme.txt
> >>>>>
> >>>>>    So now, what?
> >>>>>
> >>>>>_______________________________________________
> >>>>>xfs mailing list
> >>>>>xfs@xxxxxxxxxxx
> >>>>>http://oss.sgi.com/mailman/listinfo/xfs
> >>>>
> >>>
> >>>_______________________________________________
> >>>xfs mailing list
> >>>xfs@xxxxxxxxxxx
> >>>http://oss.sgi.com/mailman/listinfo/xfs
> >>
> >
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs




[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux