Re: [PATCH] xfs: make sure sb_fdblocks is non-negative

Wengang Wang <wen.gang.wang@xxxxxxxxxx> · Mon, 3 Jun 2024 17:23:40 +0000

> On Jun 2, 2024, at 4:37 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Fri, May 31, 2024 at 03:44:56PM +0000, Wengang Wang wrote:
>> Hi Dave,
>> 
>> Do you have further comments and/or suggestions? Or give a RB pls :D
> 
> Sorry, LSFMM intervened and I didn't notice your comment until now.
> 
No worries!

>>> On May 13, 2024, at 10:06 AM, Wengang Wang <wen.gang.wang@xxxxxxxxxx> wrote:
>>> 
>>> Things is that we have a metadump, looking at the fdblocks from super block 0, it is good.
>>> 
>>> $ xfs_db -c "sb 0" -c "p" cust.img |egrep "dblocks|ifree|icount"
>>> dblocks = 26214400
>>> icount = 512
>>> ifree = 337
>>> fdblocks = 25997100
>>> 
>>> And when looking at the log, we have the following:
>>> 
>>> $ egrep -a "fdblocks|icount|ifree" cust.log |tail
>>> sb_fdblocks 37
>>> sb_icount 1056
>>> sb_ifree 87
>>> sb_fdblocks 37
>>> sb_icount 1056
>>> sb_ifree 87
>>> sb_fdblocks 37
>>> sb_icount 1056
>>> sb_ifree 87
>>> sb_fdblocks 18446744073709551604
>>> 
>>> # cust.log is output of my script which tries to parse the log buffer.
>>> 
>>> 18446744073709551604ULL == 0xfffffffffffffff4 or -12LL 
>>> 
>>> With upstream kernel (6.7.0-rc3), when I tried to mount (log recover) the metadump,
>>> I got the following in dmesg:
>>> 
>>> [   52.927796] XFS (loop0): SB summary counter sanity check failed
>>> [   52.928889] XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x60/0x110 [xfs], xfs_sb block 0x0
>>> [   52.930890] XFS (loop0): Unmount and run xfs_repair
>>> [   52.931797] XFS (loop0): First 128 bytes of corrupted metadata buffer:
>>> [   52.932954] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00  XFSB............
>>> [   52.934333] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>> [   52.935733] 00000020: c9 c1 ed ae 84 ed 46 b9 a1 f0 09 57 4a a9 98 42  ......F....WJ..B
>>> [   52.937120] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80  ................
>>> [   52.938515] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
>>> [   52.939919] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00  .....d..........
>>> [   52.941293] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00  ..d.............
>>> [   52.942661] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19  ................
>>> [   52.944046] XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x38b/0x3a0 [xfs] (fs/xfs/xfs_buf.c:1559).  Shutting down filesystem.
>>> [   52.946710] XFS (loop0): Please unmount the filesystem and rectify the problem(s)
>>> [   52.948099] XFS (loop0): log mount/recovery failed: error -117
>>> [   52.949810] XFS (loop0): log mount failed
> 
> And that's what should be in the commit message, as it explains
> exactly how the problem occurred, the symptom that was seen, and
> why the change is necessary. It also means that anyone else who sees
> a similar problem and is grepping the commit history will see this
> and recognise it, thereby knowing that this is the fix they need...
> 

OK, got it.

>>> Looking at corresponding code:
>>> 231 xfs_validate_sb_write(
>>> 232         struct xfs_mount        *mp,
>>> 233         struct xfs_buf          *bp,
>>> 234         struct xfs_sb           *sbp)
>>> 235 {
>>> 236         /*
>>> 237          * Carry out additional sb summary counter sanity checks when we write
>>> 238          * the superblock.  We skip this in the read validator because there
>>> 239          * could be newer superblocks in the log and if the values are garbage
>>> 240          * even after replay we'll recalculate them at the end of log mount.
>>> 241          *
>>> 242          * mkfs has traditionally written zeroed counters to inprogress and
>>> 243          * secondary superblocks, so allow this usage to continue because
>>> 244          * we never read counters from such superblocks.
>>> 245          */
>>> 246         if (xfs_buf_daddr(bp) == XFS_SB_DADDR && !sbp->sb_inprogress &&
>>> 247             (sbp->sb_fdblocks > sbp->sb_dblocks ||
>>> 248              !xfs_verify_icount(mp, sbp->sb_icount) ||
>>> 249              sbp->sb_ifree > sbp->sb_icount)) {
>>> 250                 xfs_warn(mp, "SB summary counter sanity check failed");
>>> 251                 return -EFSCORRUPTED;
>>> 252         }
>>> 
>>> From dmesg and code, we know the check failure was due to bad sb_ifree vs sb_icount or bad sb_fdblocks vs sb_dblocks.
>>> 
>>> Looking at the super block dump and log dump,
>>> We know ifree and icount are good, what’s bad is sb_fdblocks. And that sb_fdblocks is from log.
>>> # I verified that sb_fdblocks is 0xfffffffffffffff4 with a UEK debug kernel (though not 6.7.0-rc3)
>>> 
>>> So the sb_fdblocks is updated from log to incore at xfs_log_sb() -> xfs_validate_sb_write() path though
>>> Should be may re-calculated from AGs.
>>> 
>>> The fix aims to make xfs_validate_sb_write() happy.
> 
> What about the sb_icount and sb_ifree counters? They are also percpu
> counters, and they can return transient negative numbers, too,
> right? If they end up in the log, the same as this transient
> negative sb_fdblocks count, won't that also cause exactly the same
> issue?
> 

Yes, sb_icount and sb_ifree are also percpu counters. They have been addressed by
commit 59f6ab40fd8735c9a1a15401610a31cc06a0bbd6, right?

> i.e. if we need to fix the sb_fdblocks sum to always be positive,
> then we need to do the same thing with the other lazy superblock
> per-cpu counters so they don't trip the over the same transient
> underflow issue...
> 

Agreed. While, I think we don’t have further percpu counters problems after this patch.  

Will send a new patch with line breakness.

Thanks,
Wengang

> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx