Re: Repair Metadata corruption detected at xfs_inode_buf_verify on CentOS 7 virtual machine

brian <a001@xxxxxxxxxxxxxxx> · Thu, 9 Sep 2021 15:11:45 -0400

I switched over to ubuntu and got the error "Device or resource busy".  
How do I get around this error?
Here is what I did:
---------------------------------------------------------
sudo apt-get install qemu
sudo apt install qemu-utils
sudo apt-get install xfsprogs

sudo modprobe nbd max_part=8

sudo qemu-nbd --connect=/dev/nbd0 
/media/sf_virtual_machine_share/centoOS7Python3p9_tmp-disk1.vdi
sudo xfs_repair /dev/nbd0p2
---------------------------------------------------------

I got the error:
---------------------------------------------------------
xfs_repair: cannot open /dev/nbd0p2: Device or resource busy
---------------------------------------------------------

When I tried xfs_repair with the -n flag:
---------------------------------------------------------
sudo xfs_repair -n /dev/nbd0p2
---------------------------------------------------------

I got (I replaced some content with ...  Most of the content was ...):
---------------------------------------------------------
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!
...
attempting to find secondary superblock...
Sorry, could not find valid secondary superblock
---------------------------------------------------------

Here are some things I ran while trying to debug.
cmd:
---------------------------------------------------------
sudo partprobe /dev/nbd0p2
---------------------------------------------------------

Result:
---------------------------------------------------------
Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 on /dev/nbd0p2 have 
been written, but we have been unable to inform the kernel of the 
change, probably because it/they are in use.  As a result, the old 
partition(s) will remain in use. You should reboot now before making 
further changes.
---------------------------------------------------------

cmd:
---------------------------------------------------------
sudo fdisk -l /dev/nbd0p2
---------------------------------------------------------

Result:
---------------------------------------------------------
Disk /dev/nbd0p2: 101.72 GiB, 109207093248 bytes, 213295104 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
brian@brian-VirtualBox:~/Desktop$ sudo lvmdiskscan
  /dev/loop0  [     219.00 MiB]
  /dev/loop1  [     <55.44 MiB]
  /dev/sda1   [     512.00 MiB]
  /dev/nbd0p1 [       1.00 GiB]
  /dev/loop2  [     <65.10 MiB]
  /dev/nbd0p2 [    <101.71 GiB] LVM physical volume
  /dev/loop3  [     <50.96 MiB]
  /dev/loop4  [     <32.30 MiB]
  /dev/loop5  [      32.30 MiB]
  /dev/sda5   [      <2.00 TiB]
  0 disks
  9 partitions
  0 LVM physical volume whole disks
  1 LVM physical volume
---------------------------------------------------------

cmd:
---------------------------------------------------------
sudo lvscan
---------------------------------------------------------

Result:
---------------------------------------------------------
  ACTIVE            '/dev/centos/swap' [2.00 GiB] inherit
  ACTIVE            '/dev/centos/home' [<49.70 GiB] inherit
  ACTIVE            '/dev/centos/root' [50.00 GiB] inherit
---------------------------------------------------------

cmd:
---------------------------------------------------------
sudo lvm pvscan
---------------------------------------------------------

Result:
---------------------------------------------------------
  PV /dev/nbd0p2   VG centos          lvm2 [101.70 GiB / 4.00 MiB free]
  Total: 1 [101.70 GiB] / in use: 1 [101.70 GiB] / in no VG: 0 [0   ]
---------------------------------------------------------

cmd:
---------------------------------------------------------
sudo lvm lvs
---------------------------------------------------------

result:
---------------------------------------------------------
  LV   VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log 
Cpy%Sync Convert
  home centos -wi-a----- <49.70g
  root centos -wi-a----- 50.00g
  swap centos -wi-a-----   2.00g
---------------------------------------------------------

cmd:
---------------------------------------------------------
sudo e2fsck /dev/nbd0p2
---------------------------------------------------------

result:
---------------------------------------------------------
e2fsck 1.45.5 (07-Jan-2020)
/dev/nbd0p2 is in use.
e2fsck: Cannot continue, aborting.
---------------------------------------------------------

cmd:
---------------------------------------------------------
fsck -N /dev/nbd0p2
---------------------------------------------------------

result:
---------------------------------------------------------
fsck from util-linux 2.34
[/usr/sbin/fsck.ext2 (1) -- /dev/nbd0p2] fsck.ext2 /dev/nbd0p2
---------------------------------------------------------

cmd:
---------------------------------------------------------
sudo stat -f /dev/nbd0p2
---------------------------------------------------------

result:
---------------------------------------------------------
  File: "/dev/nbd0p2"
    ID: 0        Namelen: 255     Type: tmpfs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 495622     Free: 495622     Available: 495622
Inodes: Total: 495622     Free: 495101
---------------------------------------------------------

On 09/08/2021 10:54 PM, Dave Chinner wrote:
On Wed, Sep 08, 2021 at 07:08:54PM -0400, brian wrote:
Here is an update of the error.  If something looks off, let me know and
I'll double check I transcribed correct.  Let me know if there is a way to
post a screenshot.

----------------------------------------------------------------------------

XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x14d/0x160
[xfs] xfs_inode block 0x1b5c1c0 xfs_inode_buf_verify
XFS (dm-2): Unmount and run xfs_repair
XFS (dm-2): First 128 bytes of corrupted metadata buffer:
ffffae6842564000: 16 49 77 8a 32 7e 72 52 14 bb 51 98 7c a5 2c 9a
.Iw.2~rR..Q.|.,.
ffffae6842564010: dd 28 4d 94 88 03 2b 8c 99 33 67 ca 6a d5 aa c9
.(M...+..3g.j...
ffffae6842564020: f8 f8 78 c7 90 fc f5 af 7f 95 03 07 0e 0c 4a 37
..x...........J7
ffffae6842564030: 7c f8 70 c7 09 14 c7 81 a5 a7 a7 cb a8 a8 28 b9
|.p...........(.
ffffae6842564040: 68 d1 a2 a0 73 f5 ae d5 73 94 23 3d 3d 5d 46 46
h...s...s.#--]FF
ffffae6842564050: 46 ca f9 f3 e7 3b 69 4c d3 94 6d db b6 75 14 3d
F....;iL..m..u.=
ffffae6842564060: 30 68 77 ec d8 e1 a4 d9 b7 6f 9f 0c 0b 0b 0b 32
)hw......o.....2
ffffae6842564070: 8e 9d 29 aa 03 4b 4c 4c 0c 52 66 29 a5 7c f4 d1
..)..KLL.Rf).|..
XFS (dm-2): metadata I/O error in “xlog_recov_do..(read#2)” at daddr 0x1b32
error 117
Ok, that doesn't contain inodes. It's not even recognisable
as filesystem metadata at all.

----------------------------------------------------------------------------

My computer is an older CentOS 6 box (on my todo list to replace).  I have
an array of VirtualBox VMs.  All my data including VMs is in /home/brian
directory.  I run rsync to backup /home/brian directory to a USB drive.
Rsync treats the files of the VM just like any other file.  I noticed that
the VMs can get corrupted if I leave them running during back up with rsync
so I normally shut them down.  For this instance, I didn't shut down the VM
when I launched rsync.

@Dave
Regarding your image file question, I don't believe the above is an image
file.  Rsync copies file by file.  And I can navigate to the USB like my
internal hdd.

Regarding the freeze files system, no I did not freeze the guest.

Regarding the older backups, I just have the rsync backup on a USB drive.
I've been working with a copy from the USB to try at get this machine back,
thus the unmodified USB rsync that was taken while the machine was running
exists.

While rsync was running, I had Selenium running web app tests in Chrome.
All the data from my program was being written to a shared folder.  Chrome
has it's own internal cache that could be changing but overall, I wouldn't
expect the file system to be changing too much especially the booting part.
Log recovery covers anything that changed in the filesystem - it has
nothing to do with what is being accessed by boot operations.

My ultimate goal is to get to the data on this VM.  I've tried to mount this
.vdi file in another instance of CentOS but have not been successful.  I may
spin up an Ubuntu instance to try to get to the data.  The data on this VM
can be recreated but I prefer not to have to redo the work.
'mount -o ro,norecovery <dev>' will allow you to skip log recovery
and hopefully let you navigate the filesystem. It will not be error
free, though, because log recovery is needed for that to occur.

If the filesystem looks mostly intact, copy all the data off it you
can. Then unmount it is and run xfs_repair -L <dev> on it to blow
away the log and repair any corruption found. This will cause data
loss, so don't do this unless you are prepared for it of have a
complete copy of the broken filesystem.

I'd highly recommend doing this with a recent xfs_repair, not the
tools that came with CentOS 6.

You could also do dry-run testing by taking a metadump of the broken
filesystem and restoring that to an image file and running reapir on
that image file. That will at least tell you how much of the
directory structure will be trashed and wht you'll be able to access
after repair.

It may be that mounting w/ norecovery is your best bet of recovering
the contents of the bad filesystems, so do that first...

Cheers,

Dave.