Re: Metadata corruption detected at xfs_agf block

Eric Sandeen <sandeen@xxxxxxxxxxx> · Mon, 18 Jul 2016 11:55:01 -0700

On 7/18/16 4:25 AM, Eryu Guan wrote:
> Hi,
> 
> I hit metadata corruption reported by xfs_repair after running fsstress
> on the test XFS.
> 
> # xfs_repair -n /dev/mapper/testvg-testlv
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
> Metadata corruption detected at xfs_agf block 0x59fa001/0x200
> flfirst 118 in agf 3 too large (max = 118)
          ^^^                           ^^^

FWIW, this confusing output was fixed by:

6aa32b4 xfs_repair: fix agf limit error messages

so today it would say:

flfirst 118 in agf 3 too large (max = 117)

> agf 118 freelist blocks bad, skipping freelist scan
> sb_fdblocks 15716842, counted 15716838
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 0
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> 
> Kernel is 4.7-rc7, xfsprogs is v4.3.0 (v4.5.0/v4.7-rc1 reported no
> corruption, I think that's because of commit 96f859d ("libxfs: pack the
> agfl header structure so XFS_AGFL_SIZE is correct"))

hm this does seem related.

> This is similar to this thread:
> 
> new fs, xfs_admin new label, metadata corruption detected
> http://oss.sgi.com/archives/xfs/2016-03/msg00297.html

That one did have a growfs step, which you don't have, right?

> which ended up a new patch in growfs code, commit ad747e3b2996 ("xfs:
> Don't wrap growfs AGFL indexes"), so I think I'd better report this
> similar issue anyway, though I'm not sure if it's really a bug.

Ok, interesting, I thought growfs was the only path to this.

/*
 * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of
 * slots in the beginning of the block for a proper header with the
 * location information and CRC.
 */
#define XFS_AGFL_SIZE(mp) \
        (((mp)->m_sb.sb_sectsize - \
         (xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
                sizeof(struct xfs_agfl) : 0)) / \
          sizeof(xfs_agblock_t))

so the packed version of struct xfs_agfl is smaller (36 vs 40), and so
yields a larger XFS_AGFL_SIZE (119 vs 118 in this case) and thus a
larger possible index (118 vs 117)

The (older) repair code you ran thinks 117 is the max index, but the
(newer) kernel created 118.  So this is newer kernel + older userspace,
that all makes sense so far.

xfs_alloc_put_freelist():

        be32_add_cpu(&agf->agf_flfirst, 1);
        xfs_trans_brelse(tp, agflbp);
        if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp)) // 119
                agf->agf_flfirst = 0;

so I guess this is the non-growfs case that can hit this as well, and
we can end up with agf_flfirts == 118 when the repair code thinks
117 is the max permissible.  It's just less likely than the growfs
case.  Now, how to fix this one for all combinations...  :(

-Eric 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs