Re: [PATCH] xfs: don't use in-core per-cpu fdblocks for !lazysbcount

Gao Xiang <hsiangkao@xxxxxxxxxx> · Sat, 17 Apr 2021 05:13:20 +0800

Hi Darrick,

On Fri, Apr 16, 2021 at 09:00:13AM -0700, Darrick J. Wong wrote:
> On Fri, Apr 16, 2021 at 05:10:23PM +0800, Gao Xiang wrote:
> > There are many paths which could trigger xfs_log_sb(), e.g.
> >   xfs_bmap_add_attrfork()
> >     -> xfs_log_sb()
> > , which overrided on-disk fdblocks by in-core per-CPU fdblocks.
> > 
> > However, for !lazysbcount cases, on-disk fdblocks is actually updated
> > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
> > in-core fdblocks due to xfs_reserve_block() or whatever, see the
> > comment in xfs_unmountfs().
> > 
> > It could be observed by the following steps reported by Zorro [1]:
> > 
> > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
> > 2. mount $dev $mnt
> > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
> > 4. umount $mnt
> > 5. xfs_repair -n $dev
> > 
> > yet due to commit f46e5a174655("xfs: fold sbcount quiesce logging
> > into log covering"), xfs_sync_sb() will be triggered even !lazysbcount
> > but xfs_log_need_covered() case when xfs_unmountfs(), so hard to
> > reproduce on kernel 5.12+.
> 
> Um, I can't understand this(?), possibly because I can't get to RHBZ and
> therefore have very little context to start from. :(

Very sorry about that.. I realized it doesn't access at all without some
permission after sending out the patch. :(

> 
> Are you saying that because the f46e commit removed the xfs_sync_sb
> calls from unmountfs for !lazysb filesystems, we no longer log the
> summary counters at unmount?  Which means that we no longer write the
> incore percpu fdblocks count to disk at unmount after we've torn down
> all the incore space reservations (when sb_fdblocks == m_fdblocks)?

Er.. I think that is by reverse, before commit f46e, we no longer logged
the summary counters at unmount, due to 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_mount.c?h=v5.11#n1177
  xfs_unmountfs
    -> xfs_log_sbcount
      -> !xfs_sb_version_haslazysbcount
        -> return 0 (xfs_sync_sb bypassed).

So the only time we update the ondisk fdblocks was during transactions,
but xfs_log_sb() corrupted this (due to no summary counters logging at
unmount).

After f46e, it became
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/xfs_log.c?h=v5.12-rc2#n982
  xfs_unmountfs
    -> xfs_log_unmount
      -> xfs_log_clean
        -> xfs_log_cover

So if xfs_log_need_covered(mp) == true and
!xfs_sb_version_haslazysbcount(&mp->m_sb),
xfs_sync_sb() will be triggered to cover the log, So
it's hard to reproduce on the current kernel (at least on my side.)

But I have no idea xfs_log_need_covered(mp) is always true at that time,
and the patchset seems a bit large and (possibly) hard to backport...

> 
> So that means that for !lazysb fses, the only time we log the sb
> counters is during transactions, and when we do log the counters we
> actually log the wrong value, since the incore reservations should never
> escape to disk?  Hence the fix below?

Yes

> 
> And then by extension, is the reason that nobody noticed before is that
> we always used to log the correct value at unmount, so fses with clean
> logs always have the correct value, and fses with dirty logs will
> recompute fdblocks after log recovery by summing the AGF free blocks
> counts?

Nope, prior to 5.12-rc1, I think it was broken for a very long time...

> 
> (Or possibly nobody uses !lazysb filesystems anymore?)
> 

Zorro found this days ago on rhel 8 kernel (4.18, maybe he's doing
some new testcases to cover this), and I think it was broken for much
much long time (I don't know which version it was broken first), maybe
it has little impact since it's just a freespace block counter.

So I think it should be backported to many stable kernel versions (?)
But I have no idea when it was broken...

> I /think/ the code change looks ok, but as you might surmise from the
> large quantity of questions, I'm not ready to RVB this yet.  The commit
> message seems like a good place to answer those questions.
> 
> > After this patch, I've seen no strange so far on older kernels
> > for the testcase above without lazysbcount.
> > 
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1949515
> 
> This strangely <cough> doesn't seem to be accessible to the public at
> large, since <cough> someone at RedHat decided to block all Oracle IPs
> <cough>.

<cough> will get rid of it the next time...

Thanks,
Gao Xiang

> 
> --D
> 
> > 
> > Reported-by: Zorro Lang <zlang@xxxxxxxxxx>
> > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index 60e6d255e5e2..423dada3f64c 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -928,7 +928,13 @@ xfs_log_sb(
> >  
> >  	mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
> >  	mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
> > -	mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
> > +		struct xfs_dsb	*dsb = bp->b_addr;
> > +
> > +		mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
> > +	} else {
> > +		mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
> > +	}
> >  
> >  	xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
> >  	xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
> > -- 
> > 2.27.0
> > 
>