Hi Dave, Please see inlines, > On May 10, 2024, at 6:17 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Fri, May 10, 2024 at 05:34:26PM -0700, Wengang Wang wrote: >> when writting super block to disk (in xfs_log_sb), sb_fdblocks is fetched from >> m_fdblocks without any lock. As m_fdblocks can experience a positive -> negativ >> -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks) >> So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is >> type of unsigned long long, it reads super big. And sb_fdblocks being bigger >> than sb_dblocks is a problem during log recovery, xfs_validate_sb_write() >> complains. >> >> Fix: >> As sb_fdblocks will be re-calculated during mount when lazysbcount is enabled, >> We just need to make xfs_validate_sb_write() happy -- make sure sb_fdblocks is >> not genative. > > Ok, I have no problems with the change being made, but I'm unclear > on why we care if these values get logged as large positive numbers? > > The comment above this code explains that these counts are known to > be inaccurate and so are not trusted. i.e. they will be corrected > post-log recovery if they are recovered from the log: > > * Lazy sb counters don't update the in-core superblock so do that now. > * If this is at unmount, the counters will be exactly correct, but at > * any other time they will only be ballpark correct because of > * reservations that have been taken out percpu counters. If we have an > * unclean shutdown, this will be corrected by log recovery rebuilding > * the counters from the AGF block counts. > Things is that we have a metadump, looking at the fdblocks from super block 0, it is good. $ xfs_db -c "sb 0" -c "p" cust.img |egrep "dblocks|ifree|icount" dblocks = 26214400 icount = 512 ifree = 337 fdblocks = 25997100 And when looking at the log, we have the following: $ egrep -a "fdblocks|icount|ifree" cust.log |tail sb_fdblocks 37 sb_icount 1056 sb_ifree 87 sb_fdblocks 37 sb_icount 1056 sb_ifree 87 sb_fdblocks 37 sb_icount 1056 sb_ifree 87 sb_fdblocks 18446744073709551604 # cust.log is output of my script which tries to parse the log buffer. 18446744073709551604ULL == 0xfffffffffffffff4 or -12LL With upstream kernel (6.7.0-rc3), when I tried to mount (log recover) the metadump, I got the following in dmesg: [ 52.927796] XFS (loop0): SB summary counter sanity check failed [ 52.928889] XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x60/0x110 [xfs], xfs_sb block 0x0 [ 52.930890] XFS (loop0): Unmount and run xfs_repair [ 52.931797] XFS (loop0): First 128 bytes of corrupted metadata buffer: [ 52.932954] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 XFSB............ [ 52.934333] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 52.935733] 00000020: c9 c1 ed ae 84 ed 46 b9 a1 f0 09 57 4a a9 98 42 ......F....WJ..B [ 52.937120] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80 ................ [ 52.938515] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ [ 52.939919] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 .....d.......... [ 52.941293] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00 ..d............. [ 52.942661] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19 ................ [ 52.944046] XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x38b/0x3a0 [xfs] (fs/xfs/xfs_buf.c:1559). Shutting down filesystem. [ 52.946710] XFS (loop0): Please unmount the filesystem and rectify the problem(s) [ 52.948099] XFS (loop0): log mount/recovery failed: error -117 [ 52.949810] XFS (loop0): log mount failed Looking at corresponding code: 231 xfs_validate_sb_write( 232 struct xfs_mount *mp, 233 struct xfs_buf *bp, 234 struct xfs_sb *sbp) 235 { 236 /* 237 * Carry out additional sb summary counter sanity checks when we write 238 * the superblock. We skip this in the read validator because there 239 * could be newer superblocks in the log and if the values are garbage 240 * even after replay we'll recalculate them at the end of log mount. 241 * 242 * mkfs has traditionally written zeroed counters to inprogress and 243 * secondary superblocks, so allow this usage to continue because 244 * we never read counters from such superblocks. 245 */ 246 if (xfs_buf_daddr(bp) == XFS_SB_DADDR && !sbp->sb_inprogress && 247 (sbp->sb_fdblocks > sbp->sb_dblocks || 248 !xfs_verify_icount(mp, sbp->sb_icount) || 249 sbp->sb_ifree > sbp->sb_icount)) { 250 xfs_warn(mp, "SB summary counter sanity check failed"); 251 return -EFSCORRUPTED; 252 } >From dmesg and code, we know the check failure was due to bad sb_ifree vs sb_icount or bad sb_fdblocks vs sb_dblocks. Looking at the super block dump and log dump, We know ifree and icount are good, what’s bad is sb_fdblocks. And that sb_fdblocks is from log. # I verified that sb_fdblocks is 0xfffffffffffffff4 with a UEK debug kernel (though not 6.7.0-rc3) So the sb_fdblocks is updated from log to incore at xfs_log_sb() -> xfs_validate_sb_write() path though Should be may re-calculated from AGs. The fix aims to make xfs_validate_sb_write() happy. Thanks, Wengang > IOWs journal recovery doesn't actually care what these values are, > so what actually goes wrong if this sum returns a negative value? > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx