Re: [PATCH 3/3] xfs: terminate perag iteration reliably on end agno

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 8 Oct 2021 10:45:13 -0400

On Fri, Oct 08, 2021 at 10:02:59AM +1100, Dave Chinner wrote:
> On Thu, Oct 07, 2021 at 08:50:53AM -0400, Brian Foster wrote:
> > The for_each_perag*() set of macros are hacky in that some (i.e. those
> > based on sb_agcount) rely on the assumption that perag iteration
> > terminates naturally with a NULL perag at the specified end agno. Others
> > allow for the final AG to have a valid perag and require the calling
> > function to clean up any potential leftover xfs_perag reference on
> > termination of the loop.
> > 
> > Aside from providing a subtly inconsistent interface, the former variant
> > is racy with a potential growfs in progress because growfs can create
> > discoverable post-eofs perags before the final superblock update that
> > completes the grow operation and increases sb_agcount. This leads to
> > unexpected assert failures (reproduced by xfs/104) such as the following
> > in the superblock buffer write verifier path:
> > 
> >  XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
> 
> Yeah, that's a bad assert. It's not valid in the context of grow or
> shrink or any of the future advanced per-ag management things we
> want to do.
> 

I think it depends on the context. I don't think it's unreasonable to
expect certain paths to not want to process post-eofs perags. I don't
really like the placement of this assert tbh (with it being in a generic
helper) and the sb write verifier is probably not ideal context for the
check in a generic sense, but it does flag unexpected behavior when you
consider the higher level iteration is based on sb_agcount.

> I'm ok with the change being proposed as a expedient bug fix, but
> I'll note that the approach taken to fix it is not compatible with
> future plans for managing shrink and perag operations. I'll comment
> on the patch first, then the rest of the email is commentary about
> how xfs_perag_get() is intended to be used...
> 
> > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > index d05c9217c3af..edcdd4fbc225 100644
> > --- a/fs/xfs/libxfs/xfs_ag.h
> > +++ b/fs/xfs/libxfs/xfs_ag.h
> > @@ -116,34 +116,30 @@ void xfs_perag_put(struct xfs_perag *pag);
> >  
> >  /*
> >   * Perag iteration APIs
> > - *
> > - * XXX: for_each_perag_range() usage really needs an iterator to clean up when
> > - * we terminate at end_agno because we may have taken a reference to the perag
> > - * beyond end_agno. Right now callers have to be careful to catch and clean that
> > - * up themselves. This is not necessary for the callers of for_each_perag() and
> > - * for_each_perag_from() because they terminate at sb_agcount where there are
> > - * no perag structures in tree beyond end_agno.
> 
> We still really need an iterator for the range iterations so that we
> can have a consistent set of behaviours for all iterations and
> don't need a special case just for the "mid walk break" where the
> code keeps the active reference to the perag for itself...
> 

Ok, but what exactly are you referring to by "an iterator" beyond what
we have here to this point? A walker function with a callback or
something? And why wouldn't we have done that in the first place instead
of introducing the API wart documented above?

> >   */
> >  static inline
> >  struct xfs_perag *xfs_perag_next(
> >  	struct xfs_perag	*pag,
> > -	xfs_agnumber_t		*agno)
> > +	xfs_agnumber_t		*agno,
> > +	xfs_agnumber_t		end_agno)
> >  {
> >  	struct xfs_mount	*mp = pag->pag_mount;
> >  
> >  	*agno = pag->pag_agno + 1;
> >  	xfs_perag_put(pag);
> > -	pag = xfs_perag_get(mp, *agno);
> > +	pag = NULL;
> > +	if (*agno <= end_agno)
> > +		pag = xfs_perag_get(mp, *agno);
> >  	return pag;
> 
> 	*agno = pag->pag_agno + 1;
> 	xfs_perag_put(pag);
> 	if (*agno > end_agno)
> 		return NULL;
> 	return xfs_perag_get(mp, *agno);
> 

Will fix.

> >  }
> >  
> >  #define for_each_perag_range(mp, agno, end_agno, pag) \
> >  	for ((pag) = xfs_perag_get((mp), (agno)); \
> > -		(pag) != NULL && (agno) <= (end_agno); \
> > -		(pag) = xfs_perag_next((pag), &(agno)))
> > +		(pag) != NULL; \
> > +		(pag) = xfs_perag_next((pag), &(agno), (end_agno)))
> >  
> >  #define for_each_perag_from(mp, agno, pag) \
> > -	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > +	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
> 
> Isn't this one line the entire bug fix right here? i.e. the
> factoring is largely unnecessary, the grow race bug is fixed by just
> this one-liner?
> 

No, the reference count problems can still occur regardless of this
particular change.

...
> 
> > The following assert failure occasionally triggers during the xfs_perag
> > free path on unmount, presumably because one of the many
> > for_each_perag() loops in the code that is expected to terminate with a
> > NULL pag raced with a growfs and actually terminated with a non-NULL
> > reference to post-eofs (at the time) perag.
> > 
> >  XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195
> > 
> > Rework the lower level perag iteration logic to explicitly terminate
> > on the specified end agno, not implicitly rely on pag == NULL as a
> > termination clause and thus avoid these problems.
> 
> IMO, this just hides the symptom that results from code that isn't
> handling unexpected adverse loop termination correctly. The
> iterators are going to get more complex in the near future, so we
> really need them to have a robust iterator API that does all the
> cleanup work correctly, rather than try to hide it all in a
> increasingly complex for loop construct.
> 

Any loop that uses one of the sb_agcount based iteration macros in a
context that can race with growfs and doesn't check pag != NULL post
loop is not handling loop termination correctly. The sb_agcount check
effectively builds in an early termination vector to every such usage,
because we can't guarantee that pag == NULL when the sb_agcount check
causes loop termination.

IOW, the following usage pattern documented in the comment above is not
universally correct/safe:

	for_each_perag(...) {
		/* no early termination */
	}
	/* no perag check because no early termination */

... because for_each_perag() is not implemented correctly to guarantee
that pag == NULL on exit of the loop. This is a simple logic bug with a
simple fix.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>