Re: xfs_repair force_geometry

"Michael L. Semon" <mlsemon35@xxxxxxxxx> · Tue, 14 May 2013 13:54:53 -0400

On 05/14/2013 08:35 AM, Stan Hoeppner wrote:
On 5/14/2013 3:56 AM, Benedikt Schmidt wrote:
...
I see, I should have mentioned this earlier. I already tried xfs_repair
and it failed to find the second superblock. Because I am still able to
mount the original disk and most parts of it I guessed that xfs_repair
is confused by the different disk geometries. What I have also already
tried out was, naturally, to copy the whole stuff with for example cp or
xfs_copy, but both failed because of filesystem errors. The only program
which didn't fail to copy the data was dd_rescue, which can handle the
errors. That is why I used, as it was my only option (as far as I can see).

You are able to mount the XFS on the original disk which means the
superblocks are apparently intact and the log section isn't corrupt.
But when you attempt to copy files from that XFS to another
disk/filesystem you get what you describe as filesystem errors.  How far
did the cp/xfs_copy progress before you received the filesystem errors?
  What is the result of running xfs_repair -n on the original filesystem?

The point of these questions is to reveal whether the original disk
simply has media surface errors toward the end of the disk where you
wrote those few most recent files, *or* if the problem with the disk is
electrical or mechanical.

The fact that cp/xfs_copy fail, yet ddrescue completes by retrying
(though possibly while ignoring some sectors due to retry limit of 1),
would tend to suggest the problem is electrical or mechanical, not
platter surface defects.  From what you've described so far it sounds
like the more load you put on the drive the more errors it throws.  This
is typical when the internal power supply circuit on a drive is failing.

While the drive is idle, I would suggest you use xfs_db on the original
XFS to locate the positions of those few files that are not backed up.
Unmount the XFS and use dd with skip/seek to copy only these files to
another location.  Do one file at a time to put as little load on the
drive as possible.  Give it some resting time between dd operations.  If
this works it eliminates the need to expand your RAID5 or attempt more
full partition copies to the new 2TB drive.  If this doesn't work, it
also eliminates the need for either of these steps, as it will
demonstrate it's simply not possible to recover the data.

I've been hesitant to suggest using the smartmontools to aid in this 
quest.  In the event of surface errors, `smartctl -a /dev/sdd` may or 
may not show the exact error locations.  The read error rate numbers 
might be helpful, too.  However, smartctl has extra features that might 
cause SMART to remap sectors that could be read one last time. `smartctl 
--test=long /dev/sdd` should be a no-no at this point.  At any rate, I 
wouldn't want that SMART initialization clunk noise to be the drive's 
last dying gasp.  Thoughts?

Michael

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs