On Thu, Sep 19, 2013 at 08:43:48AM -0400, Brian Foster wrote: > On 09/18/2013 06:51 PM, Dave Chinner wrote: > > On Wed, Sep 18, 2013 at 02:06:26PM -0400, Brian Foster wrote: > >> I just want to call out one thing here in case it isn't noticed on > >> review... the safety of this is something I was curious about. > >> Specifically, note that I've removed the inode locking from > >> xfs_inactive(), which previously covered xfs_inactive_symlink() (for > >> xfs_idata_realloc()), down into xfs_inactive_symlink_rmt(). > > > > see my comments about idata_realloc() in the previous email. It > > might be safe, but it's better not to leave a landmine if we add > > some other caller to the function in the future. > > > >> My assumption was that this is currently ok since at this point we have > >> an inode with di_nlink == 0. > > > > It's not an entirely correct assumption. The end result is likely > > the same, but di_nlink has no influence here. i.e. the inode > > lifecycle is rather complex and there is an interesting condition > > that covers inodes going through xfs_inactive(). > > > > xfs_inactive() is called when the VFS reference count goes to zero > > and the VFS inode is being reclaimed, but the XFS_IRECLAIMABLE flag > > is not yet set on it. This doesn't happen until after xfs_inactive() > > completes and the VFS calls ->destroy_inode. Hence the inode is in a > > limbo state where calls to igrab() will fail but the inode can be > > found in the inode radix trees without being marked as "under > > reclaim conditions". > > > > We handle this with xfs_iget_cache_hit() by the use of igrab(), > > which will fail on such an inode, and we use the same logic in > > xfs_inode_ag_walk_grab() to avoid this hole. That said, > > xfs_reclaim_inode_grab() does no such thing - it only checks for > > XFS_IRECLAIMABLE under an RCU traversal, and so may find inodes for > > which that the radix tree reclaimable tag is stale. hence that > > check is always done under a spinlock. > > > > IOWs, the only thing that protects us from outside interference in > > xfs_inactive() is the logic in the XFS inode cache lookups > > specifically avoiding inodes in the transient state that > > xfs_inactive() is called under. It doesn't matter what the contents > > of the inode are - it's the safe transition from one lifecycle state > > to the next that is important at this point. > > > > So, like I said in the previous email, we have to be careful with > > cache lookups to prevent races with the work xfs_inactive() is > > doing, but that doesn't mean we shouldn't still lock the inodes > > correctly when modifying them... > > > > So broadly speaking, the inode states are more granular than my di_nlink > based assumption. More that the inode lifecycle states are not related to the value of di_nlink at all ;) > We have to account for access via internal caches, > even if the inode is in the process of being torn down in the vfs. I'll > have to wade through the caching code much more to understand the > intricacies. ;) Thanks for the breakdown. First you need to understand the VFS inode lifecycle, as the XFS inode lifecycle mostly wraps around the outside of the VFS inode lifecycle. ;) > With regard to the locking here, any preference as to whether > xfs_inactive_symlink() takes the lock and hands it to > xfs_inactive_symlink_rmt() or the former locks/unlocks and the latter > continues to work as implemented in this patch (save other comments to > be addressed)? > > Actually now that I look at the code, xfs_inactive_symlink_rmt() does > the transaction allocation and reservation now, so for that reason I > think the lock/unlock pattern is required. Right, they are effectively two different cases with different locking constraints. Cheers, Dave. > > Brian > > >> If that's not accurate or not expected to > >> remain so after O_TMPFILE related work, I suppose I could pull the > >> locking back up into xfs_inactive_symlink(). > > > > O_TMPFILE itself won't change anything - they will look just like > > any other unlinked inode going through xfs_inactive() on their way > > to the XFS_IRECLAIMABLE state. > > > > It's when we start separating the xfs_inactive() work into multiple > > distinct stages to allow for optimisation of inode freeing that we > > need to be careful as these introduce new states into the lifecycle > > state machine. These will most likely involve new state flags and > > radix tree tags and walks, but AFAICT, overall concept that > > xfs_inactive/xfs_inactive_symlink is run from the same special > > isolated "limbo" context should not change.... > > > > Cheers, > > > > Dave. > > > > -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs