On Thu, Nov 11, 2021 at 11:39:30AM +0800, Ian Kent wrote: > When following a trailing symlink in rcu-walk mode it's possible to > succeed in getting the ->get_link() method pointer but the link path > string be deallocated while it's being used. > > Utilize the rcu mechanism to mitigate this risk. > > Suggested-by: Miklos Szeredi <miklos@xxxxxxxxxx> > Signed-off-by: Ian Kent <raven@xxxxxxxxxx> > --- > fs/xfs/kmem.h | 4 ++++ > fs/xfs/xfs_inode.c | 4 ++-- > fs/xfs/xfs_iops.c | 10 ++++++++-- > 3 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > index 54da6d717a06..c1bd1103b340 100644 > --- a/fs/xfs/kmem.h > +++ b/fs/xfs/kmem.h > @@ -61,6 +61,10 @@ static inline void kmem_free(const void *ptr) > { > kvfree(ptr); > } > +static inline void kmem_free_rcu(const void *ptr) > +{ > + kvfree_rcu(ptr); > +} > > > static inline void * > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index a4f6f034fb81..aaa1911e61ed 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -2650,8 +2650,8 @@ xfs_ifree( > * already been freed by xfs_attr_inactive. > */ > if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) { > - kmem_free(ip->i_df.if_u1.if_data); > - ip->i_df.if_u1.if_data = NULL; > + kmem_free_rcu(ip->i_df.if_u1.if_data); > + RCU_INIT_POINTER(ip->i_df.if_u1.if_data, NULL); > ip->i_df.if_bytes = 0; > } How do we get here in a way that the VFS will walk into this inode during a lookup? I mean, the dentry has to be validated and held during the RCU path walk, so if we are running a transaction to mark the inode as free here it has already been unlinked and the dentry turned negative. So anything that is doing a lockless pathwalk onto that dentry *should* see that it is a negative dentry at this point and hence nothing should be walking any further or trying to access the link that was shared from ->get_link(). AFAICT, that's what the sequence check bug you fixed in the previous patch guarantees. It makes no difference if the unlinked inode has been recycled or not, the lookup race condition is the same in that the inode has gone through ->destroy_inode and is now owned by the filesystem and not the VFS. Otherwise, it might just be best to memset the buffer to zero here rather than free it, and leave it to be freed when the inode is freed from the RCU callback in xfs_inode_free_callback() as per normal. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx