Re: [PATCH] xfs: don't use in-core per-cpu fdblocks for !lazysbcount

Dave Chinner <david@xxxxxxxxxxxxx> · Sun, 18 Apr 2021 08:32:01 +1000

On Sat, Apr 17, 2021 at 10:20:13AM +0800, Gao Xiang wrote:
> Hi Darrick and Dave,
> 
> On Sat, Apr 17, 2021 at 11:57:02AM +1000, Dave Chinner wrote:
> > On Fri, Apr 16, 2021 at 05:19:41PM -0700, Darrick J. Wong wrote:
> > > On Sat, Apr 17, 2021 at 05:13:20AM +0800, Gao Xiang wrote:
> 
> ...
> 
> > 
> > Nor is it necessary to fix the problem.
> > 
> > > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > > > > > index 60e6d255e5e2..423dada3f64c 100644
> > > > > > --- a/fs/xfs/libxfs/xfs_sb.c
> > > > > > +++ b/fs/xfs/libxfs/xfs_sb.c
> > > > > > @@ -928,7 +928,13 @@ xfs_log_sb(
> > > > > >  
> > > > > >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > > > > >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > > > > > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > > > > > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > > > > > +		struct xfs_dsb	*dsb = bp->b_addr;
> > > > > > +
> > > > > > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > > 
> > > Hmm... is this really needed?  I thought in !lazysbcount mode,
> > > xfs_trans_apply_sb_deltas updates the ondisk super buffer directly.
> > > So aren't all three of these updates unnecessary?
> > 
> > Yup, now I understand the issue, the fix is simply to avoid these
> > updates for !lazysb. i.e. it should just be:
> > 
> > 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > 		mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> > 		mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > 		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > 	}
> > 	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> 
> I did as this because xfs_sb_to_disk() will override them, see:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_sb.c#n629
> 
> ...
> 	to->sb_icount = cpu_to_be64(from->sb_icount);
> 	to->sb_ifree = cpu_to_be64(from->sb_ifree);
> 	to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);

> As an alternative, I was once to wrap it as:
> 
> xfs_sb_to_disk() {
> ...
> 	if (xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> 		to->sb_icount = cpu_to_be64(from->sb_icount);
> 		to->sb_ifree = cpu_to_be64(from->sb_ifree);
> 		to->sb_fdblocks = cpu_to_be64(from->sb_fdblocks);
> 	}
> ...
> }

This goes back to a commit in 2015 dropping the fields parameter from
xfs_sb_to_disk(). Originally, we only formatted the requested
parameters to the on-disk buffer from the in-memory superblock amd
this was removed in 2015 by commit 4d11a4023940 ("xfs: remove bitfield
based superblock updates") which meant all superblock modification
calls updated the entire on-disk log.

Up to that point, only xfs_log_sbcount() updated the on-disk
counters in the superblock buffer, and only for lazy-count enabled
filesystems. And xfs_bmap_add_attrfork() would only update the
features fields in the superblock, and nothing else. Now every
modification to the sueprblock updates everythign from the in-memory
state.

However, there are two sets of in-memory state for the superblock
accounting - the superblock fields and the per-cpu coutners. The
per-cpu counters are the ones we apply reservations to and the ones
we use for space tracking. The counters in the mp->m_sb are updated
in the same manner as the on-disk counters.

That is, xfs_trans_apply_sb_deltas() only applies deltas to the
directly to the in-memory superblock in the case of !lazy-count, so
these counters are actually a correct representation of the on-disk
value of the accounting when lazy-count=0.

Hence we should always be able to write the counters in mp->m_sb
directly to the on-disk superblock buffer in the case of
lazy-count=0 and the values should be correct. lazy-count=1 only
updates the mp->m_sb counters from the per-cpu counters so that the
on-disk counters aren't wildly inaccruate, and so that when we
unmount/freeze/etc the counters are actually correct.

Long story short, I think xfs_sb_to_disk() always updating the
on-disk superblock from mp->m_sb is safe to do as the counters in
mp->m_sb are updated in the same manner during transaction commit as
the superblock buffer counters for lazy-count=0....

> Yet after I observed the other callers of xfs_sb_to_disk() (e.g. growfs
> and online repair), I think a better modification is the way I proposed
> here, so no need to update xfs_sb_to_disk() and the other callers (since
> !lazysbcount is not recommended at all.)

Yup that's the original reason for having a fields flag to do
condition update of the on-disk buffer from the in-memory state.
Different code has diferrent requirements, but it looked like this
didn't matter for lazy-count filesystems because other checks
avoided the update of m_sb fields. What was missed in that
optimisation was the fact lazy-count=0 never updated the counters
directly.

/me is now wondering why we even bother with !lazy-count anymore.

WE've updated the agr btree block accounting unconditionally since
lazy-count was added, and scrub will always report a mismatch in
counts if they exist regardless of lazy-count. So why don't we just
start ignoring the on-disk value and always use lazy-count based
updates?

We only added it as mkfs option/feature bit because of the recovery
issue with not being able to account for btree blocks properly at
mount time, but now we have mechanisms for counting blocks in btrees
so even that has gone away. So we could actually just turn
on lazy-count at mount time, and we could get rid of this whole
set of subtle conditional behaviours we clearly aren't able to
exercise effectively...

> It's easier to backport and less conflict, and btw !lazysbcount also need
> to be warned out and deprecated from now.

You have to use -m crc=0 to turn off lazycount, and the deprecation
warning should come from -m crc=0...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx