On Thu, Jul 15, 2010 at 01:01:40PM -0500, Alex Elder wrote: > On Thu, 2010-07-15 at 10:38 +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=16348 > > > > When the filesystem grows to a large number of allocation groups, > > the summing of recalimable inodes gets expensive. In many cases, > > most AGs won't have any reclaimable inodes and so we are wasting CPU > > time aggregating over these AGs. This is particularly important for > > the inode shrinker that gets called frequently under memory > > pressure. > > > > To avoid the overhead, track AGs with reclaimable inodes in the > > per-ag radix tree so that we can find all the AGs with reclaimable > > inodes via a simple gang tag lookup. This involves setting the tag > > when the first reclaimable inode is tracked in the AG, and removing > > the tag when the last reclaimable inode is removed from the tree. > > Then the summation process becomes a loop walking the radix tree > > summing AGs with the reclaim tag set. > > > > This significantly reduces the overhead of scanning - a 6400 AG > > filesystea now only uses about 25% of a cpu in kswapd while slab reclaim > > progresses instead of being permanently stuck at 100% CPU and making little > > progress. Clean filesystems filesystems will see no overhead and the > > overhead only increases linearly with the number of dirty AGs. > > Using the same tag represent a perag with reclaimable > inodes as the tag representing an inode is reclaimable > is nicely consistent... > > I have a few comments below for your consideration > but they are minor (and don't even require a response). > > Overall this looks good. > > Reviewed-by: Alex Elder <aelder@xxxxxxx> > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/linux-2.6/xfs_sync.c | 71 +++++++++++++++++++++++++++++++++++++---- > > fs/xfs/linux-2.6/xfs_trace.h | 3 ++ > > 2 files changed, 67 insertions(+), 7 deletions(-) > > > > diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c > > index 56fed91..51da595 100644 > > --- a/fs/xfs/linux-2.6/xfs_sync.c > > +++ b/fs/xfs/linux-2.6/xfs_sync.c > > @@ -133,6 +133,41 @@ restart: > > return last_error; > > } > > > > +/* > > + * Select the next per-ag structure to iterate during the walk. The reclaim > > + * walk is optimised only to walk AGs with reclaimable inodes in them. > > + */ > > +static struct xfs_perag * > > +xfs_inode_ag_iter_next_pag( > > + struct xfs_mount *mp, > > + xfs_agnumber_t *first, > > + int tag) > > +{ > > + struct xfs_perag *pag = NULL; > > + > > + if (tag == XFS_ICI_RECLAIM_TAG) { > > + int found; > > + int ref; > > + > > + spin_lock(&mp->m_perag_lock); > > + found = radix_tree_gang_lookup_tag(&mp->m_perag_tree, > > + (void **)&pag, *first, 1, tag); > > + if (found <= 0) { > > + spin_unlock(&mp->m_perag_lock); > > + return NULL; > > + } > > + *first = pag->pag_agno + 1; > > Maybe move this below, just before the return. I.e.: > if (pag) > *first = pag->pag_agno + 1; > > To me it's slightly clearer that we're just setting > up to search next time with the perag following the > one returned. If found <=0, we didn't find anything in the tree. I don't know what the value of pag is going to be in this case, so we have to check the return value to determine if pag is valid or not. The other gang lookups in XFS use the same convention... > > > + /* open coded pag reference increment */ > > By open-coding here you miss the assertions in xfs_perag_get(). > (Though a common inline encapsulating them would have to be > in a header because the two functions reside in different files.) Yes, it misses them, but we do sufficient calls to xfs_perag_get() elsewhere that I don't think it makes much difference - we'll still get the ASSERT()s tripping, and the tracing will still point out where the problem is... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs