On Thu, 2021-11-11 at 11:08 -0500, Brian Foster wrote: Hi Brian, > On Thu, Nov 11, 2021 at 11:39:30AM +0800, Ian Kent wrote: > > When following a trailing symlink in rcu-walk mode it's possible to > > succeed in getting the ->get_link() method pointer but the link > > path > > string be deallocated while it's being used. > > > > Utilize the rcu mechanism to mitigate this risk. > > > > Suggested-by: Miklos Szeredi <miklos@xxxxxxxxxx> > > Signed-off-by: Ian Kent <raven@xxxxxxxxxx> > > --- > > fs/xfs/kmem.h | 4 ++++ > > fs/xfs/xfs_inode.c | 4 ++-- > > fs/xfs/xfs_iops.c | 10 ++++++++-- > > 3 files changed, 14 insertions(+), 4 deletions(-) > > > ... > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > > index a607d6aca5c4..2977e19da7b7 100644 > > --- a/fs/xfs/xfs_iops.c > > +++ b/fs/xfs/xfs_iops.c > > @@ -524,11 +524,17 @@ xfs_vn_get_link_inline( > > > > /* > > * The VFS crashes on a NULL pointer, so return - > > EFSCORRUPTED if > > - * if_data is junk. > > + * if_data is junk. Also, if the path walk is in rcu-walk > > mode > > + * and the inode link path has gone away due inode re-use > > we have > > + * no choice but to tell the VFS to redo the lookup. > > */ > > - link = ip->i_df.if_u1.if_data; > > + link = rcu_dereference(ip->i_df.if_u1.if_data); > > + if (!dentry && !link) > > + return ERR_PTR(-ECHILD); > > + > > One thing that concerns me slightly about this approach is that inode > reuse does not necessarily guarantee that if_data is NULL. It seems > technically just as possible (even if exceedingly unlikely) for link > to > point at newly allocated memory since the previous sequence count > validation check. The inode could be reused as another inline symlink > for example, though it's not clear to me if that is really a problem > for > the vfs (assuming a restart would just land on the new link > anyways?). > But the inode could also be reallocated as something like a shortform > directory, which means passing directory header data or whatever that > it > stores in if_data back to pick_link(), which is then further > processed > as a string. This is the sort of feedback I was hoping for. This sounds related to the life-cycle of xfs inodes and re-use. Hopefully someone here on the list can enlighten me on this. The thing that comes to mind is that the inode re-use would need to occur between the VFS check that validates the inode is still ok and the use of link string. I think that can still go away even with the above check. Hopefully someone can clarify what happens here. > > With that, I wonder why we wouldn't just return -ECHILD here like we > do > for the non-inline case to address the immediate problem, and then > perhaps separately consider if we can rework bits of the > reuse/reclaim > code to allow rcu lookup of inline symlinks under certain conditions. Always switching to ref-walk mode would certainly resolve the problem too, yes, perhaps we have no choice ... Ian > > FWIW, I'm also a little curious why we don't set i_link for inline > symlinks. I don't think that addresses this validation problem, but > perhaps might allow rcu lookups in the inline symlink common case > where > things don't change during the lookup (and maybe even eliminate the > need > for this custom inline callback)..? > > Brian > > > if (XFS_IS_CORRUPT(ip->i_mount, !link)) > > return ERR_PTR(-EFSCORRUPTED); > > + > > return link; > > } > > > > > > >