On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote: > There are many paths which could trigger xfs_log_sb(), e.g. > xfs_bmap_add_attrfork() > -> xfs_log_sb() > , which overrided on-disk fdblocks by in-core per-CPU fdblocks. > > However, for !lazysbcount cases, on-disk fdblocks is actually updated > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to > in-core fdblocks due to xfs_reserve_block() or whatever, see the > comment in xfs_unmountfs(). > > It could be observed by the following steps reported by Zorro [1]: > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev > 2. mount $dev $mnt > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load) > 4. umount $mnt > 5. xfs_repair -n $dev > > yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging > into log covering"), xfs_sync_sb() will be triggered even !lazysbcount > but xfs_log_need_covered() case when xfs_unmountfs(), so hard to > reproduce on kernel 5.12+. Um, I can't understand this(?), possibly because I can't get to RHBZ and therefore have very little context to start from. :( Are you saying that because the f46e commit removed the xfs_sync_sb calls from unmountfs for !lazysb filesystems, we no longer log the summary counters at unmount? Which means that we no longer write the incore percpu fdblocks count to disk at unmount after we've torn down all the incore space reservations (when sb_fdblocks == m_fdblocks)? So that means that for !lazysb fses, the only time we log the sb counters is during transactions, and when we do log the counters we actually log the wrong value, since the incore reservations should never escape to disk? Hence the fix below? And then by extension, is the reason that nobody noticed before is that we always used to log the correct value at unmount, so fses with clean logs always have the correct value, and fses with dirty logs will recompute fdblocks after log recovery by summing the AGF free blocks counts? (Or possibly nobody uses !lazysb filesystems anymore?) I /think/ the code change looks ok, but as you might surmise from the large quantity of questions, I'm not ready to RVB this yet. The commit message seems like a good place to answer those questions. > After this patch, I've seen no strange so far on older kernels > for the testcase above without lazysbcount. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1949515 This strangely <cough> doesn't seem to be accessible to the public at large, since <cough> someone at RedHat decided to block all Oracle IPs <cough>. --D > > Reported-by: Zorro Lang <zlang@xxxxxxxxxx> > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_sb.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > index 60e6d255e5e2..423dada3f64c 100644 > --- a/fs/xfs/libxfs/xfs_sb.c > +++ b/fs/xfs/libxfs/xfs_sb.c > @@ -928,7 +928,13 @@ xfs_log_sb( > > mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) { > + struct xfs_dsb *dsb = bp->b_addr; > + > + mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks); > + } else { > + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + } > > xfs_sb_to_disk(bp->b_addr, &mp->m_sb); > xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); > -- > 2.27.0 >