Thanks Dave! We had what I think was a power fluctuation, and several more drives went offline in my JBOD. I had to power-cycle the JBOD to make them show "online" again. I unmounted the arrays first, though. After doing the "echo w > /proc/sysrq-trigger" I was able to mount the problematic filesystem directly, no having to read dmesg output. If that was due to the power cycling and forcing logicalvolumes to be "optimal" (online) again, I don't know. I was able to run xfs_repair on both filesystems, and have tons of files in lost+found to parse now, but at least I have most of my data back. Thanks! Bart Bart --- Bart Brashers 3039 NW 62nd St Seattle WA 98107 206-789-1120 Home 425-412-1812 Work 206-550-2606 Mobile On Sun, Mar 8, 2020 at 3:26 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sun, Mar 08, 2020 at 12:43:29PM -0700, Bart Brashers wrote: > > An update: > > > > Mounting the degraded xfs filesystem still hangs, so I can't replay > > the journal, so I don't yet want to run xfs_repair. > > echo w > /proc/sysrq-trigger > > and dump demsg to find where it is hung. If it is not hung and is > instead stuck in a loop, use 'echo l > /proc/sysrq-trigger'. > > > I can mount the degraded xfs filesystem like this: > > > > $ mount -t xfs -o ro,norecovery,inode64,logdev=/dev/md/nvme2 > > /dev/volgrp4TB/lvol4TB /export/lvol4TB/ > > > > If I do a "du" on the contents, I see 3822 files with either > > "Structure needs cleaning" or "No such file or directory". > > TO be expected - you mounted an inconsistent filesystem image and > it's falling off the end of structures that are incomplete and > require recovery to make consistent. > > > Is what I mounted what I would get if I used the xfs_repair -L option, > > and discarded the journal? Or would there be more corruption, e.g. to > > the directory structure? > > Maybe. Maybe more, maybe less. Maybe. > > > Some of the instances of "No such file or directory" are for files > > that are not in their correct directory - I can tell by the filetype > > and the directory name. Does that by itself imply directory > > corruption? > > Maybe. > > It also may imply log recovery has not been run and so things > like renames are not complete on disk, and recvoery would fix that. > > But keep in mind your array had a triple disk failure, so there is > going to be -something- lost and not recoverable. That may well be > in the journal, at which point repair is your only option... > > > At this point, can I do a backup, either using rsync or xfsdump or > > xfs_copy? > > Do it any way you want. > > > I have a separate RAID array on the same server where I > > could put the 7.8 TB of data, though the destination already has data > > on it - so I don't think xfs_copy is right. Is xfsdump to a directory > > faster/better than rsync? Or would it be best to use something like > > > > $ tar cf - /export/lvol4TB/directory | (cd /export/lvol6TB/ ; tar xfp -) > > Do it how ever you are confident the data gets copied reliably in > the face of filesystem traversal errors. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx