Re: XFS File system in trouble

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 28, 2015 at 02:46:45AM -0500, Leslie Rhorer wrote:
> On 7/20/2015 6:17 AM, Brian Foster wrote:
> >On Sat, Jul 18, 2015 at 08:02:50PM -0500, Leslie Rhorer wrote:
> >>
> >>	I found the problem with md5sum (and probably nfs, as well).  One of the
> >>memory modules in the server was bad.  The problem with XFS persists.  Every
> >>time tar tried to create the directory:
> >>
> >>/RAID/Server-Main/Equipment/Drive Controllers/HighPoint Adapters/Rocket 2722/Driver/RR276x/Driver/Linux/openSUSE/rr276x-suse-11.2-i386/linux/suse/i386-11.1
> >>
> >>	It would begin spitting out errors, starting with "Cannot mkdir: Structure
> >>needs cleaning".  At that point, XFS had shut down.  I went into
> >>/RAID/Server-Main/Equipment/Drive Controllers/HighPoint Adapters/Rocket
> >>2722/Driver/RR276x/Driver/Linux/openSUSE/rr276x-suse-11.2-i386/linux/suse/
> >>and created the i386-11.1 directory by hand, and tar no longer starts
> >>spitting out errors at that point, but it does start up again at
> >>RR2782/Windows/Vista-Win2008-Win7-legacy_single/x64.
> >>
> >
> >So is this untar problem a reliable reproducer? If so, here's what I
> 
> 	The processes I was running this weekend ran longer than expected, and in
> fact were still running just a couple of hours ago.  I was doing an rsync
> with CRC check from the backup system to the one with the problem.  There
> were a few corrupt files, but not a huge number.  Although slower than I
> hoped, everything was running fine until a short time ago, when rsync
> encountered the very same issue I keep having with tar, which is to say it
> tried to create a directory and the file system crashed with precisely the
> same symptoms as when tar was failing.
> 
> >would try to hopefully isolate a filesystem problem from something
> >underneath:
> >
> >xfs_metadump -go /dev/md0 /somewhere/on/rootfs/md0.metadump
> >xfs_mdrestore -g /somewhere/on/rootfs/md0.metadump /.../fileonrootfs.img
> >mount /.../fileonrootfs.img /mnt/
> 
> 	I tried to do the xfs_mdrestore to the root file system, but it fails:
> 
> RAID-Server:/TEST# xfs_mdrestore -g md0.metadump RAIDfile.img
> xfs_mdrestore: cannot set filesystem image size: File too large
> 

Hmm, I guess the file size exceeds the capabilities of the root fs, even
if there might ultimately be enough space to restore the metadump.

> 	So then I did the same thing to a directory on an nfs mount from another
> machine.  That worked.  I then went to the other machine, mounted the image
> on /media, copied the tarball to the location on the mount where the tarball
> resides on the real array, dn ran the tar job. It completed without errors.
> 

That's interesting. It tells us the fs apparently isn't fundamentally
broken, but the separate machine potentially introduces a different
kernel. Is that the case here? What else is different between these
systems?

> 	I then created the image on the array where the tasks are failing and
> attempted to mount it to /media on the problematic machine.  That fails
> with:
> 
> RAID-Server:/TEST# mount /RAID/TEST/RAIDfile.img /media/
> mount: wrong fs type, bad option, bad superblock on /dev/loop0,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> 
> 	The problem is this (from syslog):
> Jul 28 01:53:48 RAID-Server kernel: [431155.847523] loop: module loaded
> Jul 28 01:53:48 RAID-Server kernel: [431155.927238] XFS (loop0): Filesystem
> has duplicate UUID 228cfaa7-ae6b-44fc-b703-1c32385231c0 - can't mount
> Jul 28 01:55:51 RAID-Server kernel: [431278.916490] XFS (loop0): Filesystem
> has duplicate UUID 228cfaa7-ae6b-44fc-b703-1c32385231c0 - can't mount
> 
> 	Presumably it has the same UUID as the RAID array because it is expected to
> do so.  I can't mount it unless I umount the RAID array, but if I do that, I
> can't get to the file to mount the dump image, since it is on the array.
> 

Ok, somebody already replied with how to get around this. That said, it
sounds like you've restored the metadump to an image file on the
problematic fs. I'm not sure how useful a test that is since we're
testing on the same hardware. I suppose it could be interesting if the
storage hardware is similar with the alternate machine referenced above.
For example, if you restore here and the test does not fail, the test on
the separate machine is probably less informative.

> 	I then copied both the tarball and the image over to the root, and while
> the system would not let me create the image on the root, it did let me copy
> the image to the root.  I then umounted the RAID array, mounted the image,
> and attempted to cd to the original directory in the image mount where the
> tarball was saved.  That failed with an I/O error:
> 

It sounds a bit strange for the mdrestore to fail on root but a cp of
the resulting image to work. Do the resulting images have the same file
size or is the rootfs copy truncated? If the latter, you could be
missing part of the fs and thus any of the following tests are probably
moot.

Brian

> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters/Rocket 2722/Driver/"
> bash: cd: /media/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters/Rocket 2722/Driver/: Input/output error
> 
> 	I changed directories to a point two directories above the previous attempt
> and did a long listing:
> 
> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters"
> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters# ll
> ls: cannot access RocketRAID 2722: Input/output error
> total 4
> drwxr-xr-x 6 root lrhorer 4096 Jul 18 19:26 Rocket 2722
> ?????????? ? ?    ?          ?            ? RocketRAID 2722
> 
> 	As you can see, Rocket 2722 is still there, but RocketRAID 2722 is very
> sick.  Rocket 2722 is the parent of where the tarbal was, however, so I did
> a cd and an ll again:
> 
> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters# cd "Rocket 2722"/
> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
> Adapters/Rocket 2722# ll
> ls: cannot access BIOS: Input/output error
> ls: cannot access Driver: Input/output error
> ls: cannot access HighPoint RAID Management Software: Input/output error
> ls: cannot access Manual: Input/output error
> total 248
> -rwxr--r-- 1 root lrhorer 245760 Nov 20  2008 autorun.exe
> -rwxr--r-- 1 root lrhorer     51 Mar 21  2001 autorun.inf
> ?????????? ? ?    ?            ?            ? BIOS
> ?????????? ? ?    ?            ?            ? Driver
> ?????????? ? ?    ?            ?            ? HighPoint RAID Management
> Software
> ?????????? ? ?    ?            ?            ? Manual
> -rwxr--r-- 1 root lrhorer   1134 Feb  5  2012 readme.txt
> 
> 	So now, what?
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux