On Sat, 16 Nov 2019 at 18:29, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: > > On 11/15/19 9:33 PM, Patrick Rynhart wrote: > > Hi all, > > > > A small number of our CentOS VMs (about 4 out of a fleet of 200) are > > experiencing ongoing, regular XFS corruption - and I'm not sure how to > > troubleshoot the problem. They are all CentOS 7.7 VMs are are using > > VMWare Paravirtual SCSI. The version of xfsprogs being used is > > 4.5.0-20.el7.x86_64, and the kernel is 3.10.0-1062.1.2.el7.x86_64. > > The VMWare version is ESXi, 6.5.0, 14320405. > > > > When the fault happens - the VMs will go into single user mode with > > the following text displayed on the console: > > > > sd 0:0:0:0: [sda] Assuming drive cache: write through > > XFS (dm-0): Internal error XFS_WANT_CORRUPTED_GOTO at line 1664 of > > file fs/xfs/libxfs > > /xfs_alloc.c. Caller xfs_free_extent+0xaa/0x140 [xfs] > > XFS (dm-0): Internal error xfs_trans_cancel at line 984 of file > > fs/xfs/xfs_trans.c. > > Caller xfs_efi_recover+0x17d/0x1a0 [xfs] > > XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem > > XFS (dm-0): Please umount the filesystem and rectify the problem(s) > > XFS (dm-0): Failed to recover intents > > Seems like this is not the whole relevant log; "Failed to recover intents" > indicates it was in log replay but we don't see that starting. Did you > cut out other interesting bits? Thank you for the reply. When the problem happens the system ends up in the EL7 dracut emergency shell. Here's a picture of what the console looks like right now (I haven't rebooted yet): https://pasteboard.co/IGUpPiN.png How can I get some debug information re the (attempted ?) log replay for debug / analysis ? > -Eric