On Mon, Feb 18, 2019 at 03:22:03PM +0100, Carlos Maiolino wrote: > Hi. > > > Dear XFS folks, > > > > > > > [ 25.506600] XFS (sdd): Mounting V5 Filesystem > > [ 25.629621] XFS (sdd): Starting recovery (logdev: internal) > > [ 25.685100] NFSD: starting 90-second grace period (net f0000098) > > [ 26.433828] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1 > > [ 26.433834] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem > > [ 26.433835] XFS (sdd): Please umount the filesystem and rectify the problem(s) > > [ 26.433857] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. > > > Ok, filesystem shut itself down likely because blocks allocated in the > transaction exceeded the reservation. > > Could you please post the whole dmesg? > > > We mounted it with an overlay files, > > I'm not sure what you meant here, could you please specify what you meant by > 'overlay files'? Are you using this XFS filesystem as an upper/lower FS for > overlayfs? > > > and the xfs_repair shows the > > summary below. > > > > ``` > > # xfs_repair -vv /dev/mapper/sddovl > > - block cache size set to 4201400 entries > > Phase 2 - using internal log > > - zero log... > > zero_log: head block 3930112 tail block 3929088 > > ERROR: The filesystem has valuable metadata changes in a log which needs to > > be replayed. Mount the filesystem to replay the log, and unmount it before > > re-running xfs_repair. If you are unable to mount the filesystem, then use > > the -L option to destroy the log and attempt a repair. > > Note that destroying the log may cause corruption -- please attempt a mount > > Have you tried to mount/umount the filesystem before zeroing the log? This is > supposed to be used as a last resort. Zero out the logs I mean. > > > > > The directory `lost+found` contains almost five million files > > > > # find lost+found | wc > > 4859687 4859687 110985720 > > We don't have neither the whole xfs_repair output nor more information about the > filesystem itself, but looks like you had huge directory(ies) update in your log > which were not replayed, and all orphan inodes ended up in the lost+found =/ > > > > > We saved the output of `xfs_repair`, but it’s over 500 MB in size, so we > > cannot attach it. > > > > `sudo xfs_metadump -go /dev/sdd sdd-metadump.dump` takes over 15 minutes > > and the dump files is 8.8 GB in size. > > At this point, xfs_metadump won't help much once you already repaired the > filesystem. > Although, why are you getting a metadump from /dev/sdd, when the fs you tried to > repair is a device-mapper device? Are you facing this issue in more than one > filesystem? > > > > > It’d be great, if you could give hints on debugging this issue further, > > and comment, if you think it is possible to recover the files, that means, > > to fix the log, so that it can be cleanly applied. > > Unfortunately, you already got rid of the log, so, you can't recover it anymore, > but all the recovered files will be in lost+found, with their inode numbers as > file name. > > > Ok, so below is the dmesg, thanks for having attached it. > > One thing is there are 2 devices failing. sdd and dm-0. So my question again, is > this the same filesystem or are they 2 separated filesystems showing exactly the > same issue? The filesystem has found corrupted inodes in the AG's unlinked > bucket, but this shouldn't affect log recovery. > > If they are two separated devices, did you xfs_repair'ed both of them? After you > repaired the filesystem(s), do you still see the memory corruption issue? > > At this point, there is not much we can do regarding the filesystem metadata, > once you already forced a xfs_repair zeroing the logs. > > So, could you please tell the current state of the filesystem (or filesystems if > there is more than one)? Are you still seeing the same memory corruption error > even after xfs_repair it? > FWIW, if you do still have an original copy of the fs, we could see about whether bypassing the shutdown allows us to trade log recovery failure for a space accounting error. This would still require a subsequent repair, but that may be less invasive than zapping the log and dealing with the aftermath of that. Brian > And for completeness, please provide us as much information as possible from > this(these) filesystem(s): > > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > Cheers. > > > [ 1380.869451] XFS (sdd): Mounting V5 Filesystem > > [ 1380.912559] XFS (sdd): Starting recovery (logdev: internal) > > [ 1381.030780] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1 > > [ 1381.030785] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem > > [ 1381.030786] XFS (sdd): Please umount the filesystem and rectify the problem(s) > > [ 1381.031086] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing. > > [ 1381.031088] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. > > [ 1381.031090] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing. > > [ 1381.031093] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. > > [ 1381.031095] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing. > <...> > > [ 1381.031113] XFS (sdd): Ending recovery (logdev: internal) > > [ 1381.031490] XFS (sdd): Error -5 reserving per-AG metadata reserve pool. > > [ 1381.031492] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5 > > > [ 2795.123228] XFS (dm-0): Ending recovery (logdev: internal) > > [ 2795.231020] XFS (dm-0): Error -5 reserving per-AG metadata reserve pool. > > [ 2795.231023] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5 > > > [10944.023429] XFS (dm-0): Mounting V5 Filesystem > > [10944.035260] XFS (dm-0): Ending clean mount > > [11664.862376] XFS (dm-0): Unmounting Filesystem > > [11689.260213] XFS (dm-0): Mounting V5 Filesystem > > [11689.338187] XFS (dm-0): Ending clean mount > > > > > -- > Carlos