On Wed 26-04-17 10:04:26, Dave Chinner wrote: > On Tue, Apr 25, 2017 at 10:25:03AM +0200, Luis R. Rodriguez wrote: > > I've been staring at xfs_reclaim_inodes_ag() for a bit now to determine > > if and how its deterministic it is. Fortunately the code makes it relatively > > clear it chugs on 32 xfs_inodes at at a time before cond_resched()'ing. I was > > concerned originally if we cond_resched() even if we goto restart but it seems > > we do. > > > > While reviewing this however I ran into the following question based on a > > comment on the routine. Is such a thing as the below change needed ? > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > index 3531f8f72fa5..05f3c79b4f11 100644 > > --- a/fs/xfs/xfs_icache.c > > +++ b/fs/xfs/xfs_icache.c > > @@ -1157,7 +1157,9 @@ xfs_reclaim_inodes_ag( > > for (i = 0; i < nr_found; i++) { > > struct xfs_inode *ip = batch[i]; > > > > - if (done || xfs_reclaim_inode_grab(ip, flags)) > > + if (done || > > + XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno || > > + xfs_reclaim_inode_grab(ip, flags)) > > batch[i] = NULL; > > If a new check is required here, it belongs in > xfs_reclaim_inode_grab(), not as separate logic in the if statement > here as it needs to be serialised with other checks the grab does. > > As it is, the grab code first checks for a zero ip->ino, which will > catch all cases where the inode is still in the RCU grace period and > a lookup has raced. If the inode is not under reclaim (i.e the inode > number is non-zero) then the grab code checks and sets XFS_IRECLAIM > under a spinlock, thereby prevent all further inode cache > lookup+grab operations from succeeding until reclaim completes and > the RCU grace period expires and frees the inode. i.e. if the > XFS_IRECLAIM flag is already set, then the inode is skipped > regardless of the inode number because it's already on it's way to > being freed by RCU. > > Hence, IMO, adding this check here doesn't change lookup/reclaim > processing correctness. Agreed, the code looks correct as is to me. > > This is what the code looks like before but pay special attention to the > > comment and a race with RCU: > > > > if (done || xfs_reclaim_inode_grab(ip, flags)) > > batch[i] = NULL; > > > > /* > > * Update the index for the next lookup. > > * Catch overflows into the next AG range > > * which can occur if we have inodes in the > > * last block of the AG and we are > > * currently pointing to the last inode. > > * > > * Because we may see inodes that are from > > * the wrong AG due to RCU freeing and > > * reallocation, only update the index if > > * it lies in this AG. It was a race that > > * lead us to see this inode, so another > > * lookup from > > * the same index will not find it again. > > */ > > if (XFS_INO_TO_AGNO(mp, ip->i_ino) != > > pag->pag_agno) > > continue; > > It's stale code left over from development of the RCU freeing of > inodes where ..... > > > I checked with Jan Kara and he believes the current code is correct but that > > its the comment that that may be misleading. As per Jan the race is between > > getting an inode reclaimed and grabbing it. Ie, XFS frees the inodes by RCU. > > However it doesn't actually *reuse* the inode until RCU period passes > > (unlike inodes allocated from slab with SLAB_RCU can be). So it can happen > > ..... I initially tried using SLAB_DESTROY_BY_RCU which meant the > RCU grace period did not prevent reallocation of inodes that had > been freed. Hence this check was (once) necessary to prevent the > reclaim index going whacky on a reallocated inode. > > However, I dropped the SLAB_DESTROY_BY_RCU behaviour because I > couldn't get it to work reliably and race free, and there was no > measurable improvement in performance just using call_rcu() t delay > the freeing of individual inodes to the end of the RCU grace period. Yeah, so SLAB_DESTROY_BY_RCU is beneficial only if you have a workload where getting CPU cache-hot objects on allocation improves performance measurably. I remember quite a few years ago Nick Piggin had a workload where this was advantageous for inodes (like opening & closing tons of files so that inodes where flowing in and out of cache, I don't remember the details anymore) but it was mostly a corner case. > In hindsight, I forgot to remove the second comment hunk and the > check that was eventually committed as 1a3e8f3da09c ("xfs: convert > inode cache lookups to use RCU locking"). It also appears the commit > message was not updated, too. It says: > > To avoid the read vs write contention, change the cache to use > RCU locking on the read side. To avoid needing to RCU free every > single inode, use the built in slab RCU freeing mechanism. This > requires us to be able to detect lookups of freed inodes, [...] > > So, really, the comment+check you're confused about can simply be > removed.... Thanks for confirmation and reference! Looking at that commit there seem to be more comments there speaking about reallocation which probably need updating - e.g. the one in xfs_reclaim_inode_grab(). Luis, will you take care of that? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html