Re: mount before xfs_repair hangs

Bart Brashers <bart.brashers@xxxxxxxxx> · Sun, 8 Mar 2020 18:32:41 -0700

Thanks Dave!

We had what I think was a power fluctuation, and several more drives
went offline in my JBOD. I had to power-cycle the JBOD to make them
show "online" again. I unmounted the arrays first, though.

After doing the "echo w > /proc/sysrq-trigger" I was able to mount the
problematic filesystem directly, no having to read dmesg output. If
that was due to the power cycling and forcing logicalvolumes to be
"optimal" (online) again, I don't know.

I was able to run xfs_repair on both filesystems, and have tons of
files in lost+found to parse now, but at least I have most of my data
back.

Thanks!

Bart

Bart
---
Bart Brashers
3039 NW 62nd St
Seattle WA 98107
206-789-1120 Home
425-412-1812 Work
206-550-2606 Mobile

On Sun, Mar 8, 2020 at 3:26 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Sun, Mar 08, 2020 at 12:43:29PM -0700, Bart Brashers wrote:
> > An update:
> >
> > Mounting the degraded xfs filesystem still hangs, so I can't replay
> > the journal, so I don't yet want to run xfs_repair.
>
> echo w > /proc/sysrq-trigger
>
> and dump demsg to find where it is hung. If it is not hung and is
> instead stuck in a loop, use 'echo l > /proc/sysrq-trigger'.
>
> > I can mount the degraded xfs filesystem like this:
> >
> > $ mount -t xfs -o ro,norecovery,inode64,logdev=/dev/md/nvme2
> > /dev/volgrp4TB/lvol4TB /export/lvol4TB/
> >
> > If I do a "du" on the contents, I see 3822 files with either
> > "Structure needs cleaning" or "No such file or directory".
>
> TO be expected - you mounted an inconsistent filesystem image and
> it's falling off the end of structures that are incomplete and
> require recovery to make consistent.
>
> > Is what I mounted what I would get if I used the xfs_repair -L option,
> > and discarded the journal? Or would there be more corruption, e.g. to
> > the directory structure?
>
> Maybe. Maybe more, maybe less. Maybe.
>
> > Some of the instances of "No such file or directory" are for files
> > that are not in their correct directory - I can tell by the filetype
> > and the directory name. Does that by itself imply directory
> > corruption?
>
> Maybe.
>
> It also may imply log recovery has not been run and so things
> like renames are not complete on disk, and recvoery would fix that.
>
> But keep in mind your array had a triple disk failure, so there is
> going to be -something- lost and not recoverable. That may well be
> in the journal, at which point repair is your only option...
>
> > At this point, can I do a backup, either using rsync or xfsdump or
> > xfs_copy?
>
> Do it any way you want.
>
> > I have a separate RAID array on the same server where I
> > could put the 7.8 TB of data, though the destination already has data
> > on it - so I don't think xfs_copy is right. Is xfsdump to a directory
> > faster/better than rsync? Or would it be best to use something like
> >
> > $ tar cf - /export/lvol4TB/directory | (cd /export/lvol6TB/ ; tar xfp -)
>
> Do it how ever you are confident the data gets copied reliably in
> the face of filesystem traversal errors.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx