Am Mittwoch, den 23.01.2019, 13:58 -0500 schrieb Brian Foster: > On Wed, Jan 23, 2019 at 08:03:07AM -0500, Brian Foster wrote: > > On Wed, Jan 23, 2019 at 07:11:53AM -0500, Brian Foster wrote: > > > On Wed, Jan 23, 2019 at 12:14:17PM +0100, Lucas Stach wrote: > > > > Am Dienstag, den 22.01.2019, 08:02 -0500 schrieb Brian Foster: > > > > > On Tue, Jan 22, 2019 at 11:39:53AM +0100, Lucas Stach wrote: > > > > > > Hi Brian, > > > > > > > > > > > > Am Montag, den 21.01.2019, 13:11 -0500 schrieb Brian Foster: > > > > > > [...] > > > > > > > > So for the moment, here's the output of the above sequence. > > > > > > > > > > > > > > > > xfs_db> convert agno 5 agbno 7831662 fsb > > > > > > > > 0x5077806e (1350008942) > > > > > > > > xfs_db> fsb 0x5077806e > > > > > > > > xfs_db> type finobt > > > > > > > > xfs_db> print > > > > > > > > magic = 0x49414233 > > > > > > > > level = 1 > > > > > > > > numrecs = 335 > > > > > > > > leftsib = 7810856 > > > > > > > > rightsib = null > > > > > > > > bno = 7387612016 > > > > > > > > lsn = 0x6671003d9700 > > > > > > > > uuid = 026711cc-25c7-44b9-89aa-0aac496edfec > > > > > > > > owner = 5 > > > > > > > > crc = 0xe12b19b2 (correct) > > > > > > > > > > > > > > As expected, we have the inobt magic. Interesting that this is a fairly > > > > > > > full intermediate (level > 0) node. There is no right sibling, which > > > > > > > means we're at the far right end of the tree. I wouldn't mind poking > > > > > > > around a bit more at the tree, but that might be easier with access to > > > > > > > the metadump. I also think that xfs_repair would have complained were > > > > > > > something more significant wrong with the tree. > > > > > > > > > > > > > > Hmm, I wonder if the (lightly tested) diff below would help us catch > > > > > > > anything. It basically just splits up the currently combined inobt and > > > > > > > finobt I/O verifiers to expect the appropriate magic number (rather than > > > > > > > accepting either magic for both trees). Could you give that a try? > > > > > > > Unless we're doing something like using the wrong type of cursor for a > > > > > > > particular tree, I'd think this would catch wherever we happen to put a > > > > > > > bad magic on disk. Note that this assumes the underlying filesystem has > > > > > > > been repaired so as to try and detect the next time an on-disk > > > > > > > corruption is introduced. > > > > > > > > > > > > > > You'll also need to turn up the XFS error level to make sure this prints > > > > > > > out a stack trace if/when a verifier failure triggers: > > > > > > > > > > > > > > echo 5 > /proc/sys/fs/xfs/error_level > > > > > > > > > > > > > > I guess we also shouldn't rule out hardware issues or whatnot. I did > > > > > > > notice you have a strange kernel version: 4.19.4-holodeck10. Is that a > > > > > > > distro kernel? Has it been modified from upstream in any way? If so, I'd > > > > > > > strongly suggest to try and confirm whether this is reproducible with an > > > > > > > upstream kernel. > > > > > > > > > > > > With the finobt verifier changes applied we are unable to mount the FS, > > > > > > even after running xfs_repair. > > > > > > > > > > > > xfs_repair had found "bad magic # 0x49414233 in inobt block 5/2631703", > > > > > > which would be daddr 0x1b5db40b8 according to xfs_db. The mount trips > > > > > > over a buffer at a different daddr though: > > > > > > > > > > > > > > > > So the mount failed, you ran repair and discovered the bad magic..? That > > > > > suggests there was still an issue with the fs on-disk. Could you run > > > > > 'xfs_repair -n' after the actual xfs_repair to confirm the fs is free of > > > > > errors before it is mounted? Note that xfs_repair unconditionally > > > > > regenerates certain metadata structures (like the finobt) from scratch > > > > > so there is always the possibility that xfs_repair itself is introducing > > > > > some problem in the fs. > > > > > > > > The sequence was: > > > > 1. Trying to mount with the debug kernel, which didn't work due to the > > > > finobt verifier rejecting a bad btree node. > > > > 2. Full run of mutating xfs_repair (which did find a finobt node with > > > > bad magic) > > > > 3. Try to mount again, still fails due to finobt verifier rejecting a > > > > bad btree node, but at a different daddr than what was flagged by > > > > xfs_repair. > > > > 4. xfs_repair -n finds a bad magic finobt node at the same position as > > > > the mount splat. > > > > > > > > This leads me to wonder if xfs_repair was the poison rather than the > > > > cure in this case. After digging into the xfs_repair code I found the > > > > following, which looks suspicious, but I don't know enough about XFS > > > > internals to tell if this can actually happen: > > > > > > > > > > That does imply xfs_repair is the culprit. If you haven't, could you try > > > the xfs_repair -> xfs_repair -n sequence without a mount attempt in the > > > picture just to be sure? That may also explain why the block seems to > > > shift around. It wouldn't surprise me if it's the same relative block in > > > the tree, but IIRC the tree itself can be shifted around run to run. > > > > > > > build_ino_tree(), which is shared between XFS_BTNUM_INO and > > > > XFS_BTNUM_FINO btree rebuilds contains the following in one of the > > > > loops: > > > > > > > > if (lptr->num_recs_pb > 0) > > > > > > > > prop_ino_cursor(mp, agno, btree_curs, > > > > > > > > ino_rec->ino_startnum, 0); > > > > > > > > prop_ino_cursor() calls libxfs_btree_init_block() with the btnum > > > > parameter being a fixed XFS_BTNUM_INO. > > > > > > > > > > It's been a while since I've looked at the guts of this code but that > > > definitely does not look right. build_ino_tree() is where we abstract > > > construction of the different tree types. Nice catch! > > > > > > If this is the problem, I'll have to take a closer look to figure out > > > why this problem isn't more prevalent.. > > > > > > > Hmm... so it looks like this is the code responsible for _growing_ the > > interior nodes of the tree (i.e., level > 0). If I'm following the code > > correctly, the high level algorithm is to first set up a block for each > > level of the tree. This occurs early in build_ino_tree() and uses the > > correct btnum. We then start populating the leaf nodes with the finobt > > records that have previously been built up in-core. For each leaf block, > > we populate the higher level block with the key based on the first > > record in the leaf then fill in the remaining records in the leaf. This > > repeats until the block at the next level up is full, at which point we > > grab a new level > 0 block and initialize it with the inobt magic. > > > > I haven't confirmed, but I think this basically means that we'd > > initialize every intermediate block after the left-most block at each > > level in the tree with the inobt magic. I think Dave already recognized > > in the other subthread that you have a 2 block level 1 tree. I suspect > > the only reasons we haven't run into this until now are that the finobt > > is intended to remain fairly small by only tracking free inode records > > and thus prioritizing allocation from those records (removing them from > > the tree) and this is limited to users who repair a filesystem in such a > > state. > > > > I created a large finobt filesystem, confirmed that xfs_repair stamps > every node as noted above with the inobt magic, and verified that the > patch below fixes the problem. I'll post it to the list shortly. Thanks > again. I can confirm that with the patch applied xfs_repair is able to fix the corruption. A second xfs_repair run didn't find any corrupt btree nodes anymore. I can also confirm that manually stamping the right magic in the corrupt finobt block via xfs_db brought our FS back to life without a xfs_repair run. Regards, Lucas