Re: [PATCH] xfs: make sure sb_fdblocks is non-negative

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 3 Jun 2024 09:37:14 +1000

On Fri, May 31, 2024 at 03:44:56PM +0000, Wengang Wang wrote:
> Hi Dave,
> 
> Do you have further comments and/or suggestions? Or give a RB pls :D

Sorry, LSFMM intervened and I didn't notice your comment until now.

> > On May 13, 2024, at 10:06 AM, Wengang Wang <wen.gang.wang@xxxxxxxxxx> wrote:
> > 
> > Things is that we have a metadump, looking at the fdblocks from super block 0, it is good.
> > 
> > $ xfs_db -c "sb 0" -c "p" cust.img |egrep "dblocks|ifree|icount"
> > dblocks = 26214400
> > icount = 512
> > ifree = 337
> > fdblocks = 25997100
> > 
> > And when looking at the log, we have the following:
> > 
> > $ egrep -a "fdblocks|icount|ifree" cust.log |tail
> > sb_fdblocks 37
> > sb_icount 1056
> > sb_ifree 87
> > sb_fdblocks 37
> > sb_icount 1056
> > sb_ifree 87
> > sb_fdblocks 37
> > sb_icount 1056
> > sb_ifree 87
> > sb_fdblocks 18446744073709551604
> > 
> > # cust.log is output of my script which tries to parse the log buffer.
> > 
> > 18446744073709551604ULL == 0xfffffffffffffff4 or -12LL 
> > 
> > With upstream kernel (6.7.0-rc3), when I tried to mount (log recover) the metadump,
> > I got the following in dmesg:
> > 
> > [   52.927796] XFS (loop0): SB summary counter sanity check failed
> > [   52.928889] XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x60/0x110 [xfs], xfs_sb block 0x0
> > [   52.930890] XFS (loop0): Unmount and run xfs_repair
> > [   52.931797] XFS (loop0): First 128 bytes of corrupted metadata buffer:
> > [   52.932954] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00  XFSB............
> > [   52.934333] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > [   52.935733] 00000020: c9 c1 ed ae 84 ed 46 b9 a1 f0 09 57 4a a9 98 42  ......F....WJ..B
> > [   52.937120] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80  ................
> > [   52.938515] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
> > [   52.939919] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00  .....d..........
> > [   52.941293] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00  ..d.............
> > [   52.942661] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19  ................
> > [   52.944046] XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x38b/0x3a0 [xfs] (fs/xfs/xfs_buf.c:1559).  Shutting down filesystem.
> > [   52.946710] XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> > [   52.948099] XFS (loop0): log mount/recovery failed: error -117
> > [   52.949810] XFS (loop0): log mount failed

And that's what should be in the commit message, as it explains
exactly how the problem occurred, the symptom that was seen, and
why the change is necessary. It also means that anyone else who sees
a similar problem and is grepping the commit history will see this
and recognise it, thereby knowing that this is the fix they need...

> > Looking at corresponding code:
> > 231 xfs_validate_sb_write(
> > 232         struct xfs_mount        *mp,
> > 233         struct xfs_buf          *bp,
> > 234         struct xfs_sb           *sbp)
> > 235 {
> > 236         /*
> > 237          * Carry out additional sb summary counter sanity checks when we write
> > 238          * the superblock.  We skip this in the read validator because there
> > 239          * could be newer superblocks in the log and if the values are garbage
> > 240          * even after replay we'll recalculate them at the end of log mount.
> > 241          *
> > 242          * mkfs has traditionally written zeroed counters to inprogress and
> > 243          * secondary superblocks, so allow this usage to continue because
> > 244          * we never read counters from such superblocks.
> > 245          */
> > 246         if (xfs_buf_daddr(bp) == XFS_SB_DADDR && !sbp->sb_inprogress &&
> > 247             (sbp->sb_fdblocks > sbp->sb_dblocks ||
> > 248              !xfs_verify_icount(mp, sbp->sb_icount) ||
> > 249              sbp->sb_ifree > sbp->sb_icount)) {
> > 250                 xfs_warn(mp, "SB summary counter sanity check failed");
> > 251                 return -EFSCORRUPTED;
> > 252         }
> > 
> > From dmesg and code, we know the check failure was due to bad sb_ifree vs sb_icount or bad sb_fdblocks vs sb_dblocks.
> > 
> > Looking at the super block dump and log dump,
> > We know ifree and icount are good, what’s bad is sb_fdblocks. And that sb_fdblocks is from log.
> > # I verified that sb_fdblocks is 0xfffffffffffffff4 with a UEK debug kernel (though not 6.7.0-rc3)
> > 
> > So the sb_fdblocks is updated from log to incore at xfs_log_sb() -> xfs_validate_sb_write() path though
> > Should be may re-calculated from AGs.
> > 
> > The fix aims to make xfs_validate_sb_write() happy.

What about the sb_icount and sb_ifree counters? They are also percpu
counters, and they can return transient negative numbers, too,
right? If they end up in the log, the same as this transient
negative sb_fdblocks count, won't that also cause exactly the same
issue?

i.e. if we need to fix the sb_fdblocks sum to always be positive,
then we need to do the same thing with the other lazy superblock
per-cpu counters so they don't trip the over the same transient
underflow issue...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx