On Tue, Aug 04, 2015 at 02:52:33AM -0500, Leslie Rhorer wrote: > It's failing, again. The rsync job failed and when I attempt to untar the > file in the image mount, it fails there, as well. See below. I formatted a > 1.5T drive as xfs and mounted it under /media. I then dumped the failing FS > to a file on /media using xfs_metadump and used xfs_mdrestore to create an > image of the FS. I then mounted the image, copied over the tarball to its > location, and ran tar to extract the files: > Ok, so is this a reliable reproducer? If so, does it reproduce on your separate hardware? If so, can you share the (compressed) metadump somewhere? Brian > RAID-Server:/# mount -o nouuid /media/md0.img /TEST > > RAID-Server:/# cd "/TEST/Server-Main/Equipment/Drive Controllers/HighPoint > Adapters/Rocket 2722/Driver"/ > > RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint > Adapters/Rocket 2722/Driver# cp "/RAID/Server-Main/Equipment/Drive > Controllers/HighPoint Adapters/Rocket 2722/Driver/RR_27xx.tar.gz" ./ > > RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint > Adapters/Rocket 2722/Driver# tar -xzvf RR_27xx.tar.gz > DC7280/ > DC7280/Linux/ > DC7280/Linux/Opensource/ > DC7280/Linux/Opensource/DC7280-linux-src-v1.0-110621-1313.tar.gz > DC7280/Windows/ > DC7280/Windows/Vista-Win2008-Win7/ > DC7280/Windows/Vista-Win2008-Win7/x32/ > DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.cat > DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.inf > DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.sys > DC7280/Windows/Vista-Win2008-Win7/x64/ > DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.cat > DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.inf > DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.sys > DC7280/Windows/Vista-Win2008-Win7/Readme.txt > DC7280/.ddinfo > R272x/ > R272x/Linux/ > R272x/Linux/Opensource/ > R272x/Linux/Opensource/partial/ > R272x/Linux/Opensource/partial/include/ > > ... > > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/pcitable > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/readme.txt > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhdd > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step1.sh > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step2.sh > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/ > tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: > Cannot mkdir: Structure needs cleaning > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh > tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: > Cannot mkdir: Input/output error > tar: > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh: > Cannot open: No such file or directory > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py > tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: > Cannot mkdir: Input/output error > tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py: > Cannot open: No such file or directory > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo > tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: > Cannot mkdir: Input/output error > tar: > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo: > Cannot open: No such file or directory > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias > tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: > Cannot mkdir: Input/output error > tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias: > Cannot open: No such file or directory > RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.cgz > > gzip: tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: > Cannot mkdir: Input/output errorstdin: Input/output error > > tar: Unexpected EOF in archive > tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot utime: Input/output error > tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change ownership to uid 0, gid > 1000: Input/output error > tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change mode to rwxr-xr-x: > Input/output error > tar: RR274x/Driver/Linux: Cannot utime: Input/output error > tar: RR274x/Driver/Linux: Cannot change ownership to uid 0, gid 1000: > Input/output error > tar: RR274x/Driver/Linux: Cannot change mode to rwxr-xr-x: Input/output > error > tar: RR274x/Driver: Cannot utime: Input/output error > tar: RR274x/Driver: Cannot change ownership to uid 0, gid 1000: Input/output > error > tar: RR274x/Driver: Cannot change mode to rwxr-xr-x: Input/output error > tar: RR274x: Cannot utime: Input/output error > tar: RR274x: Cannot change ownership to uid 0, gid 1000: Input/output error > tar: RR274x: Cannot change mode to rwxr-xr-x: Input/output error > tar: Error is not recoverable: exiting now > > > dmesg: > [131329.013475] XFS (md0): Mounting V4 Filesystem > [131329.918438] XFS (md0): Ending clean mount > [131499.357099] XFS (md0): Mounting V4 Filesystem > [131499.709248] XFS (md0): Ending clean mount > [131874.545344] loop: module loaded > [131874.549914] XFS (loop0): Mounting V4 Filesystem > [131874.555540] XFS (loop0): Ending clean mount > [132020.964431] XFS (loop0): xfs_iread: validation failed for inode > 124656869424 failed > [132020.964435] ffff88028b078000: 49 4e 00 00 03 02 00 00 00 30 00 70 00 00 > 03 e8 IN.......0.p.... > [132020.964437] ffff88028b078010: 00 00 00 00 06 20 b0 6f 01 2e 00 00 00 00 > 00 16 ..... .o........ > [132020.964438] ffff88028b078020: 01 57 37 fd 2b 5d 22 9e 1e 0a 61 8c 00 00 > 00 20 .W7.+]"...a.... > [132020.964440] ffff88028b078030: ff ff 00 d2 1b f6 27 90 00 00 00 00 00 00 > 00 00 ......'......... > [132020.964454] XFS (loop0): Internal error xfs_iread at line 392 of file > /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_inode_buf.c. Caller > xfs_iget+0x24b/0x690 [xfs] > [132020.964457] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 #1 > Debian 3.16.7-ckt11-1 > [132020.964459] Hardware name: To be filled by O.E.M. To be filled by > O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013 > [132020.964460] 0000000000000001 ffffffff8150b405 ffff880424059800 > ffffffffa09115cb > [132020.964463] 0000018800000010 ffffffffa0916f6b ffff88030f5c6c00 > ffff880424059800 > [132020.964465] 0000000000000075 ffff8800ad1afe98 ffffffffa095cb3a > ffffffffa0916f6b > [132020.964467] Call Trace: > [132020.964471] [<ffffffff8150b405>] ? dump_stack+0x41/0x51 > [132020.964478] [<ffffffffa09115cb>] ? xfs_corruption_error+0x5b/0x80 [xfs] > [132020.964483] [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs] > [132020.964492] [<ffffffffa095cb3a>] ? xfs_iread+0xea/0x400 [xfs] > [132020.964497] [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs] > [132020.964503] [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs] > [132020.964511] [<ffffffffa0956de6>] ? xfs_ialloc+0xa6/0x500 [xfs] > [132020.964517] [<ffffffffa092658e>] ? kmem_zone_alloc+0x6e/0xe0 [xfs] > [132020.964525] [<ffffffffa09572a2>] ? xfs_dir_ialloc+0x62/0x2a0 [xfs] > [132020.964531] [<ffffffffa09251e5>] ? xfs_trans_reserve+0x1f5/0x200 [xfs] > [132020.964538] [<ffffffffa09579a9>] ? xfs_create+0x489/0x700 [xfs] > [132020.964541] [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190 > [132020.964548] [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs] > [132020.964550] [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160 > [132020.964551] [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0 > [132020.964554] [<ffffffff815115cd>] ? > system_call_fast_compare_end+0x10/0x15 > [132020.964555] XFS (loop0): Corruption detected. Unmount and run xfs_repair > [132020.964564] XFS (loop0): Internal error xfs_trans_cancel at line 959 of > file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. Caller > xfs_create+0x2b2/0x700 [xfs] > [132020.964566] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 #1 > Debian 3.16.7-ckt11-1 > [132020.964567] Hardware name: To be filled by O.E.M. To be filled by > O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013 > [132020.964568] 000000000000000c ffffffff8150b405 ffff8800ad1afe98 > ffffffffa0925e07 > [132020.964570] ffff880002530800 ffff880079e03ec8 ffff880424059800 > ffffffffa09577d2 > [132020.964571] 0000000000000001 ffff880079e03e20 ffff880079e03e1c > ffff880079e03eb0 > [132020.964573] Call Trace: > [132020.964575] [<ffffffff8150b405>] ? dump_stack+0x41/0x51 > [132020.964581] [<ffffffffa0925e07>] ? xfs_trans_cancel+0xc7/0xf0 [xfs] > [132020.964588] [<ffffffffa09577d2>] ? xfs_create+0x2b2/0x700 [xfs] > [132020.964590] [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190 > [132020.964596] [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs] > [132020.964598] [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160 > [132020.964600] [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0 > [132020.964602] [<ffffffff815115cd>] ? > system_call_fast_compare_end+0x10/0x15 > [132020.964604] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 960 > of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. Return > address = 0xffffffffa0925e20 > [132021.196487] XFS (loop0): Corruption of in-memory data detected. Shutting > down filesystem > [132021.196491] XFS (loop0): Please umount the filesystem and rectify the > problem(s) > [132024.791456] XFS (loop0): xfs_log_force: error 5 returned. > [132054.854625] XFS (loop0): xfs_log_force: error 5 returned. > [132084.917775] XFS (loop0): xfs_log_force: error 5 returned. > [132114.980927] XFS (loop0): xfs_log_force: error 5 returned. > [132145.044086] XFS (loop0): xfs_log_force: error 5 returned. > [132175.107307] XFS (loop0): xfs_log_force: error 5 returned. > [132205.170404] XFS (loop0): xfs_log_force: error 5 returned. > [132235.233587] XFS (loop0): xfs_log_force: error 5 returned. > > > On 8/2/2015 3:24 PM, Leslie Rhorer wrote: > > > > OK, this is goofy. It seems to be working, now. As usual, I've > >been doing some work on the server this weekend, but I can't think of > >anything I have done that would fix the issue. I did replace the > >remaining good 4G RAM module with a pair of 8G RAM modules, but memtest > >reported the remaining 4G module as good, and I verified the removed > >module really was bad. I also replaced the removable drive carrier and > >cables that were feeding the two SSDs, once of which was reporting > >failures as noted in the syslog. It's hard for me to believe either of > >those things could have been causing the issue, though. > > > > I attached a 1.5T external drive to the server and formatted it as > >XFS in preparation to continue troubleshooting. To make sure of things, > >I tried decompressing the tarball, again, and this time it worked all > >the way to the end. I then deleted the entire directory structure > >created by the tarball and decompressed the file again twice. I'll see > >if the rsync process works. That will take a couple of days. > > > >On 7/28/2015 5:11 PM, Brian Foster wrote: > >>On Tue, Jul 28, 2015 at 10:13:01AM -0500, Leslie Rhorer wrote: > >>>On 7/28/2015 7:33 AM, Brian Foster wrote: > >>>>On Tue, Jul 28, 2015 at 02:46:45AM -0500, Leslie Rhorer wrote: > >>>>>On 7/20/2015 6:17 AM, Brian Foster wrote: > >>>>>>On Sat, Jul 18, 2015 at 08:02:50PM -0500, Leslie Rhorer wrote: > >>>>>>> > >>... > >>>> > >>>>> I then copied both the tarball and the image over to the root, > >>>>>and while > >>>>>the system would not let me create the image on the root, it did > >>>>>let me copy > >>>>>the image to the root. I then umounted the RAID array, mounted the > >>>>>image, > >>>>>and attempted to cd to the original directory in the image mount > >>>>>where the > >>>>>tarball was saved. That failed with an I/O error: > >>>>> > >>>> > >>>>It sounds a bit strange for the mdrestore to fail on root but a cp of > >>>>the resulting image to work. Do the resulting images have the same file > >>>>size or is the rootfs copy truncated? If the latter, you could be > >>>>missing part of the fs and thus any of the following tests are probably > >>>>moot. > >>> > >>> Well, it can't be as large as it is reported, let's put it that way, > >>>although the reported file size is the same. Ls claims it to be 16T in > >>>size, which cannot be the case on a 100G partition. I forgot to > >>>mention cp > >>>does complain: > >>> > >>>RAID-Server:/# cp /RAID/TEST/RAIDfile.img ./ > >>>cp: cannot lseek ‘./RAIDfile.img’: Invalid argument > >>> > >>> But it does the same thing on the backup server, and it works > >>>there. I > >>>tried a cmp, and it seems to be hung. It just may be taking a long > >>>time, > >>>however. > >>> > >> > >>Yeah, you can't really trust the resulting image. It doesn't take much > >>space to create a very large sparse file, but different filesystems have > >>different maximum file size limits. The problem here is that some > >>metadata near the beginning of the file might reference or depend on > >>something near the end, and I/Os beyond the end of the file will > >>probably result in errors. > >> > >>I'd probably try the nouuid approach since the hardware is similar as > >>well as some of the other interesting suggestions that have been made to > >>try and get the image on the rootfs and see what happens there too. > >> > >>Brian > >> > >>>>Brian > >>>> > >>>>>RAID-Server:/# cd "/media/Server-Main/Equipment/Drive > >>>>>Controllers/HighPoint > >>>>>Adapters/Rocket 2722/Driver/" > >>>>>bash: cd: /media/Server-Main/Equipment/Drive Controllers/HighPoint > >>>>>Adapters/Rocket 2722/Driver/: Input/output error > >>>>> > >>>>> I changed directories to a point two directories above the > >>>>>previous attempt > >>>>>and did a long listing: > >>>>> > >>>>>RAID-Server:/# cd "/media/Server-Main/Equipment/Drive > >>>>>Controllers/HighPoint > >>>>>Adapters" > >>>>>RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint > >>>>>Adapters# ll > >>>>>ls: cannot access RocketRAID 2722: Input/output error > >>>>>total 4 > >>>>>drwxr-xr-x 6 root lrhorer 4096 Jul 18 19:26 Rocket 2722 > >>>>>?????????? ? ? ? ? ? RocketRAID 2722 > >>>>> > >>>>> As you can see, Rocket 2722 is still there, but RocketRAID 2722 > >>>>>is very > >>>>>sick. Rocket 2722 is the parent of where the tarbal was, however, > >>>>>so I did > >>>>>a cd and an ll again: > >>>>> > >>>>>RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint > >>>>>Adapters# cd "Rocket 2722"/ > >>>>>RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint > >>>>>Adapters/Rocket 2722# ll > >>>>>ls: cannot access BIOS: Input/output error > >>>>>ls: cannot access Driver: Input/output error > >>>>>ls: cannot access HighPoint RAID Management Software: Input/output > >>>>>error > >>>>>ls: cannot access Manual: Input/output error > >>>>>total 248 > >>>>>-rwxr--r-- 1 root lrhorer 245760 Nov 20 2008 autorun.exe > >>>>>-rwxr--r-- 1 root lrhorer 51 Mar 21 2001 autorun.inf > >>>>>?????????? ? ? ? ? ? BIOS > >>>>>?????????? ? ? ? ? ? Driver > >>>>>?????????? ? ? ? ? ? HighPoint RAID > >>>>>Management > >>>>>Software > >>>>>?????????? ? ? ? ? ? Manual > >>>>>-rwxr--r-- 1 root lrhorer 1134 Feb 5 2012 readme.txt > >>>>> > >>>>> So now, what? > >>>>> > >>>>>_______________________________________________ > >>>>>xfs mailing list > >>>>>xfs@xxxxxxxxxxx > >>>>>http://oss.sgi.com/mailman/listinfo/xfs > >>>> > >>> > >>>_______________________________________________ > >>>xfs mailing list > >>>xfs@xxxxxxxxxxx > >>>http://oss.sgi.com/mailman/listinfo/xfs > >> > > > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs