On Fri, Oct 08, 2021 at 10:02:59AM +1100, Dave Chinner wrote: > On Thu, Oct 07, 2021 at 08:50:53AM -0400, Brian Foster wrote: > > The for_each_perag*() set of macros are hacky in that some (i.e. those > > based on sb_agcount) rely on the assumption that perag iteration > > terminates naturally with a NULL perag at the specified end agno. Others > > allow for the final AG to have a valid perag and require the calling > > function to clean up any potential leftover xfs_perag reference on > > termination of the loop. > > > > Aside from providing a subtly inconsistent interface, the former variant > > is racy with a potential growfs in progress because growfs can create > > discoverable post-eofs perags before the final superblock update that > > completes the grow operation and increases sb_agcount. This leads to > > unexpected assert failures (reproduced by xfs/104) such as the following > > in the superblock buffer write verifier path: > > > > XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22 > > Yeah, that's a bad assert. It's not valid in the context of grow or > shrink or any of the future advanced per-ag management things we > want to do. > I think it depends on the context. I don't think it's unreasonable to expect certain paths to not want to process post-eofs perags. I don't really like the placement of this assert tbh (with it being in a generic helper) and the sb write verifier is probably not ideal context for the check in a generic sense, but it does flag unexpected behavior when you consider the higher level iteration is based on sb_agcount. > I'm ok with the change being proposed as a expedient bug fix, but > I'll note that the approach taken to fix it is not compatible with > future plans for managing shrink and perag operations. I'll comment > on the patch first, then the rest of the email is commentary about > how xfs_perag_get() is intended to be used... > > > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h > > index d05c9217c3af..edcdd4fbc225 100644 > > --- a/fs/xfs/libxfs/xfs_ag.h > > +++ b/fs/xfs/libxfs/xfs_ag.h > > @@ -116,34 +116,30 @@ void xfs_perag_put(struct xfs_perag *pag); > > > > /* > > * Perag iteration APIs > > - * > > - * XXX: for_each_perag_range() usage really needs an iterator to clean up when > > - * we terminate at end_agno because we may have taken a reference to the perag > > - * beyond end_agno. Right now callers have to be careful to catch and clean that > > - * up themselves. This is not necessary for the callers of for_each_perag() and > > - * for_each_perag_from() because they terminate at sb_agcount where there are > > - * no perag structures in tree beyond end_agno. > > We still really need an iterator for the range iterations so that we > can have a consistent set of behaviours for all iterations and > don't need a special case just for the "mid walk break" where the > code keeps the active reference to the perag for itself... > Ok, but what exactly are you referring to by "an iterator" beyond what we have here to this point? A walker function with a callback or something? And why wouldn't we have done that in the first place instead of introducing the API wart documented above? > > */ > > static inline > > struct xfs_perag *xfs_perag_next( > > struct xfs_perag *pag, > > - xfs_agnumber_t *agno) > > + xfs_agnumber_t *agno, > > + xfs_agnumber_t end_agno) > > { > > struct xfs_mount *mp = pag->pag_mount; > > > > *agno = pag->pag_agno + 1; > > xfs_perag_put(pag); > > - pag = xfs_perag_get(mp, *agno); > > + pag = NULL; > > + if (*agno <= end_agno) > > + pag = xfs_perag_get(mp, *agno); > > return pag; > > *agno = pag->pag_agno + 1; > xfs_perag_put(pag); > if (*agno > end_agno) > return NULL; > return xfs_perag_get(mp, *agno); > Will fix. > > } > > > > #define for_each_perag_range(mp, agno, end_agno, pag) \ > > for ((pag) = xfs_perag_get((mp), (agno)); \ > > - (pag) != NULL && (agno) <= (end_agno); \ > > - (pag) = xfs_perag_next((pag), &(agno))) > > + (pag) != NULL; \ > > + (pag) = xfs_perag_next((pag), &(agno), (end_agno))) > > > > #define for_each_perag_from(mp, agno, pag) \ > > - for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag)) > > + for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag)) > > Isn't this one line the entire bug fix right here? i.e. the > factoring is largely unnecessary, the grow race bug is fixed by just > this one-liner? > No, the reference count problems can still occur regardless of this particular change. ... > > > The following assert failure occasionally triggers during the xfs_perag > > free path on unmount, presumably because one of the many > > for_each_perag() loops in the code that is expected to terminate with a > > NULL pag raced with a growfs and actually terminated with a non-NULL > > reference to post-eofs (at the time) perag. > > > > XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195 > > > > Rework the lower level perag iteration logic to explicitly terminate > > on the specified end agno, not implicitly rely on pag == NULL as a > > termination clause and thus avoid these problems. > > IMO, this just hides the symptom that results from code that isn't > handling unexpected adverse loop termination correctly. The > iterators are going to get more complex in the near future, so we > really need them to have a robust iterator API that does all the > cleanup work correctly, rather than try to hide it all in a > increasingly complex for loop construct. > Any loop that uses one of the sb_agcount based iteration macros in a context that can race with growfs and doesn't check pag != NULL post loop is not handling loop termination correctly. The sb_agcount check effectively builds in an early termination vector to every such usage, because we can't guarantee that pag == NULL when the sb_agcount check causes loop termination. IOW, the following usage pattern documented in the comment above is not universally correct/safe: for_each_perag(...) { /* no early termination */ } /* no perag check because no early termination */ ... because for_each_perag() is not implemented correctly to guarantee that pag == NULL on exit of the loop. This is a simple logic bug with a simple fix. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >