On Thu, 2010-07-15 at 10:38 +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > https://bugzilla.kernel.org/show_bug.cgi?id=16348 > > When the filesystem grows to a large number of allocation groups, > the summing of recalimable inodes gets expensive. In many cases, > most AGs won't have any reclaimable inodes and so we are wasting CPU > time aggregating over these AGs. This is particularly important for > the inode shrinker that gets called frequently under memory > pressure. > > To avoid the overhead, track AGs with reclaimable inodes in the > per-ag radix tree so that we can find all the AGs with reclaimable > inodes via a simple gang tag lookup. This involves setting the tag > when the first reclaimable inode is tracked in the AG, and removing > the tag when the last reclaimable inode is removed from the tree. > Then the summation process becomes a loop walking the radix tree > summing AGs with the reclaim tag set. > > This significantly reduces the overhead of scanning - a 6400 AG > filesystea now only uses about 25% of a cpu in kswapd while slab reclaim > progresses instead of being permanently stuck at 100% CPU and making little > progress. Clean filesystems filesystems will see no overhead and the > overhead only increases linearly with the number of dirty AGs. Using the same tag represent a perag with reclaimable inodes as the tag representing an inode is reclaimable is nicely consistent... I have a few comments below for your consideration but they are minor (and don't even require a response). Overall this looks good. Reviewed-by: Alex Elder <aelder@xxxxxxx> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/linux-2.6/xfs_sync.c | 71 +++++++++++++++++++++++++++++++++++++---- > fs/xfs/linux-2.6/xfs_trace.h | 3 ++ > 2 files changed, 67 insertions(+), 7 deletions(-) > > diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c > index 56fed91..51da595 100644 > --- a/fs/xfs/linux-2.6/xfs_sync.c > +++ b/fs/xfs/linux-2.6/xfs_sync.c > @@ -133,6 +133,41 @@ restart: > return last_error; > } > > +/* > + * Select the next per-ag structure to iterate during the walk. The reclaim > + * walk is optimised only to walk AGs with reclaimable inodes in them. > + */ > +static struct xfs_perag * > +xfs_inode_ag_iter_next_pag( > + struct xfs_mount *mp, > + xfs_agnumber_t *first, > + int tag) > +{ > + struct xfs_perag *pag = NULL; > + > + if (tag == XFS_ICI_RECLAIM_TAG) { > + int found; > + int ref; > + > + spin_lock(&mp->m_perag_lock); > + found = radix_tree_gang_lookup_tag(&mp->m_perag_tree, > + (void **)&pag, *first, 1, tag); > + if (found <= 0) { > + spin_unlock(&mp->m_perag_lock); > + return NULL; > + } > + *first = pag->pag_agno + 1; Maybe move this below, just before the return. I.e.: if (pag) *first = pag->pag_agno + 1; To me it's slightly clearer that we're just setting up to search next time with the perag following the one returned. > + /* open coded pag reference increment */ By open-coding here you miss the assertions in xfs_perag_get(). (Though a common inline encapsulating them would have to be in a header because the two functions reside in different files.) > + ref = atomic_inc_return(&pag->pag_ref); > + spin_unlock(&mp->m_perag_lock); > + trace_xfs_perag_get_reclaim(mp, pag->pag_agno, ref, _RET_IP_); > + } else { > + pag = xfs_perag_get(mp, *first); > + (*first)++; > + } > + return pag; > +} > + > int > xfs_inode_ag_iterator( > struct xfs_mount *mp, . . . _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs