Dear Brian, dear Darrick, On 02/18/19 18:57, Brian Foster wrote: > On Mon, Feb 18, 2019 at 09:32:51AM -0800, Darrick J. Wong wrote: >> On Mon, Feb 18, 2019 at 11:17:46AM -0500, Brian Foster wrote: >>> On Mon, Feb 18, 2019 at 04:08:00PM +0100, Paul Menzel wrote: >>>> On 02/18/19 15:34, Brian Foster wrote: >>>>> On Mon, Feb 18, 2019 at 03:22:03PM +0100, Carlos Maiolino wrote: >>>> >>>>>>> [ 25.506600] XFS (sdd): Mounting V5 Filesystem >>>>>>> [ 25.629621] XFS (sdd): Starting recovery (logdev: internal) >>>>>>> [ 25.685100] NFSD: starting 90-second grace period (net f0000098) >>>>>>> [ 26.433828] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1 >>>>>>> [ 26.433834] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem >>>>>>> [ 26.433835] XFS (sdd): Please umount the filesystem and rectify the problem(s) >>>>>>> [ 26.433857] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. >>>>>>> >>>>>> Ok, filesystem shut itself down likely because blocks allocated in the >>>>>> transaction exceeded the reservation. >>>> >>>>>>> We mounted it with an overlay files, >>>>>> >>>>>> I'm not sure what you meant here, could you please specify what you meant by >>>>>> 'overlay files'? Are you using this XFS filesystem as an upper/lower FS for >>>>>> overlayfs? >>>> >>>> To not touch the original data on `/dev/sdd` after the error we created on overlay >>>> FS to test xfs_repair, and analyze the effect. >>>> >>>> ``` >>>> # more /scratch/local3/mkoverlay.sh >>>> dev=/dev/sdd >>>> ovl=/scratch/local3/X4042-overlay.file >>>> newdevname=sddovl >>>> size=$(blockdev --getsz "$dev") >>>> >>>> loop=$(losetup -f --show -- "$ovl") >>>> printf '%s\n' "0 $size snapshot $dev $loop P 8" >>>> printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname" >>>> # lsblk >>>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >>>> loop0 7:0 0 10G 0 loop >>>> loop1 7:1 0 10G 0 loop >>>> └─sddovl 252:0 0 51T 0 dm >>>> sda 8:0 0 931.5G 0 disk >>>> ├─sda1 8:1 0 256G 0 part / >>>> └─sda2 8:2 0 675.5G 0 part /amd/tinytina/0 >>>> sdb 8:16 0 931.5G 0 disk /amd/tinytina/1 >>>> sdc 8:32 0 27.3T 0 disk /amd/tinytina/2 >>>> sdd 8:48 0 51T 0 disk >>>> └─sddovl 252:0 0 51T 0 dm >>>> sde 8:64 0 51T 0 disk /amd/tinytina/X/X4075 >>>> sdf 8:80 0 51T 0 disk /amd/tinytina/X/X4076 >>>> ``` >>>> >>>>>>> and the xfs_repair shows the summary below. >>>>>>> >>>>>>> ``` >>>>>>> # xfs_repair -vv /dev/mapper/sddovl >>>>>>> - block cache size set to 4201400 entries >>>>>>> Phase 2 - using internal log >>>>>>> - zero log... >>>>>>> zero_log: head block 3930112 tail block 3929088 >>>>>>> ERROR: The filesystem has valuable metadata changes in a log which needs to >>>>>>> be replayed. Mount the filesystem to replay the log, and unmount it before >>>>>>> re-running xfs_repair. If you are unable to mount the filesystem, then use >>>>>>> the -L option to destroy the log and attempt a repair. >>>>>>> Note that destroying the log may cause corruption -- please attempt a mount >>>>>> >>>>>> Have you tried to mount/umount the filesystem before zeroing the log? This is >>>>>> supposed to be used as a last resort. Zero out the logs I mean. >>>> >>>> Yes, we tried that, and were told to run `xfs_repair`. >>>> >>>>>>> The directory `lost+found` contains almost five million files >>>>>>> >>>>>>> # find lost+found | wc >>>>>>> 4859687 4859687 110985720 >>>>>> >>>>>> We don't have neither the whole xfs_repair output nor more information about the >>>>>> filesystem itself, but looks like you had huge directory(ies) update in your log >>>>>> which were not replayed, and all orphan inodes ended up in the lost+found =/ >>>> >>>> I’ll see if I can share the 500 MB xfs_repair output. >>>> >>>> ``` >>>> $ lsblk /dev/sdd >>>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >>>> sdd 8:48 0 51T 0 disk >>>> $ sudo xfs_info /dev/sdd >>>> Password (for pmenzel) : >>>> meta-data=/dev/sdd isize=512 agcount=51, agsize=268435392 blks >>>> = sectsz=512 attr=2, projid32bit=1 >>>> = crc=1 finobt=1, sparse=0, rmapbt=0 >>>> = reflink=0 >>>> data = bsize=4096 blocks=13671838208, imaxpct=1 >>>> = sunit=64 swidth=896 blks >>>> naming =version 2 bsize=4096 ascii-ci=0, ftype=1 >>>> log =internal log bsize=4096 blocks=521728, version=2 >>>> = sectsz=512 sunit=64 blks, lazy-count=1 >>>> realtime =none extsz=4096 blocks=0, rtextents=0 >>>> ``` >>>> >>>>>>> We saved the output of `xfs_repair`, but it’s over 500 MB in size, so we >>>>>>> cannot attach it. >>>>>>> >>>>>>> `sudo xfs_metadump -go /dev/sdd sdd-metadump.dump` takes over 15 minutes >>>>>>> and the dump files is 8.8 GB in size. >>>>>> >>>>>> At this point, xfs_metadump won't help much once you already repaired the >>>>>> filesystem. >>>> >>>> As written, we only run the repair on the overlay. >>>> >>>>>> Although, why are you getting a metadump from /dev/sdd, when the fs you tried to >>>>>> repair is a device-mapper device? Are you facing this issue in more than one >>>>>> filesystem? >>>> >>>> As written above, it was to not touch `/dev/sdd`. So the meta data dump should be >>>> useable. >>>> >>>>>>> It’d be great, if you could give hints on debugging this issue further, >>>>>>> and comment, if you think it is possible to recover the files, that means, >>>>>>> to fix the log, so that it can be cleanly applied. >>>>>> >>>>>> Unfortunately, you already got rid of the log, so, you can't recover it anymore, >>>>>> but all the recovered files will be in lost+found, with their inode numbers as >>>>>> file name. >>>>>> >>>>>> Ok, so below is the dmesg, thanks for having attached it. >>>>>> >>>>>> One thing is there are 2 devices failing. sdd and dm-0. So my question again, is >>>>>> this the same filesystem or are they 2 separated filesystems showing exactly the >>>>>> same issue? The filesystem has found corrupted inodes in the AG's unlinked >>>>>> bucket, but this shouldn't affect log recovery. >>>>>> >>>>>> If they are two separated devices, did you xfs_repair'ed both of them? After you >>>>>> repaired the filesystem(s), do you still see the memory corruption issue? >>>> >>>> No, we do not see the corruption issue anymore, and mounting works as expected. >>>> >>>>>> At this point, there is not much we can do regarding the filesystem metadata, >>>>>> once you already forced a xfs_repair zeroing the logs. >>>>>> >>>>>> So, could you please tell the current state of the filesystem (or filesystems if >>>>>> there is more than one)? Are you still seeing the same memory corruption error >>>>>> even after xfs_repair it? >>>> >>>> The user wants to continue using the device, and does not need the recovered files, >>>> so we had to go ahead, and repair `/dev/sde` now too. We are only interested in >>>> having this bug fixed in XFS, and not about data recovery. >>>> >>> >>> Well, if the metadump is still available and is based on the original >>> (broken) fs then it may be able to reproduce the original problem. How >>> large is it compressed? Would you be able to upload it somewhere? >>> >>> Brian >>> >>>>> FWIW, if you do still have an original copy of the fs, we could see >>>>> about whether bypassing the shutdown allows us to trade log recovery >>>>> failure for a space accounting error. This would still require a >>>>> subsequent repair, but that may be less invasive than zapping the log >>>>> and dealing with the aftermath of that. >>>> >>>> Unfortunately, it’s 50 TB, so we do not have a copy anymore. >>>> >>>>>> And for completeness, please provide us as much information as possible from >>>>>> this(these) filesystem(s): >>>>>> >>>>>> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F >>>> >>>> /dev/sdd /amd/tinytina/X/X4074 xfs rw,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=7168,noquota 0 0 >>>> >>>>>>> [ 1380.869451] XFS (sdd): Mounting V5 Filesystem >>>>>>> [ 1380.912559] XFS (sdd): Starting recovery (logdev: internal) >>>>>>> [ 1381.030780] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1 >> >> <groan> Funny, I saw this exact same thing a week ago. We really ought >> to record a stack trace when we cancel dirty transactions. >> >> Hmm, so leaking millions of nfs file handles ... were they temporary or >> deleted files? >> >> I bet I know exactly what caused this -- freeing unlinked inodes that >> required a free inode btree expansion, but m_inotbt_nores was >> incorrectly set during log recovery so we didn't reserve any blocks and >> blew up. > > Interesting. I suppose rather than upload it, the user could try to > restore the metadump somewhere, bump /proc/sys/fs/xfs/error_level to >=5 > and reproduce the mount failure to try and generate a stack trace. Please find the 2.1 GB LZ4-compressed metadata dump at [1]. Please delete it after you are done using it for analysis. >> Relevant patches that might fix this: >> >> "xfs: rename m_inotbt_nores to m_finobt_nores" >> "xfs: reserve blocks for ifree transaction during log recovery" >> >> (As mentioned earlier, there's not much we can do once the log has been >> zereod.) >> >> --D >> >>>>>>> [ 1381.030785] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem >>>>>>> [ 1381.030786] XFS (sdd): Please umount the filesystem and rectify the problem(s) >>>>>>> [ 1381.031086] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing. >>>>>>> [ 1381.031088] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. >>>>>>> [ 1381.031090] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing. >>>>>>> [ 1381.031093] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5. >>>>>>> [ 1381.031095] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing. >>>>>> <...> >>>>>>> [ 1381.031113] XFS (sdd): Ending recovery (logdev: internal) >>>>>>> [ 1381.031490] XFS (sdd): Error -5 reserving per-AG metadata reserve pool. >>>>>>> [ 1381.031492] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5 >>>>>> >>>>>>> [ 2795.123228] XFS (dm-0): Ending recovery (logdev: internal) >>>>>>> [ 2795.231020] XFS (dm-0): Error -5 reserving per-AG metadata reserve pool. >>>>>>> [ 2795.231023] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5 >>>>>> >>>>>>> [10944.023429] XFS (dm-0): Mounting V5 Filesystem >>>>>>> [10944.035260] XFS (dm-0): Ending clean mount >>>>>>> [11664.862376] XFS (dm-0): Unmounting Filesystem >>>>>>> [11689.260213] XFS (dm-0): Mounting V5 Filesystem >>>>>>> [11689.338187] XFS (dm-0): Ending clean mount >>>> >>>> As written, data recovery is not needed anymore, but, if it’d help the driver, >>>> we could share the logs we have (after making sure it’s obfuscated). >>>> >>>> Please tell us, if you are interested, and what dumps, xfs_repair output and >>>> so on would be needed. Kind regards, Paul [1]: https://www.molgen.mpg.de/~pmenzel/sdd-metadump.dump.lz4
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature