Hello Al Viro, Any comments on this patch? This is the same approach as suggested by you [1] [1]: https://patchwork.kernel.org/patch/10909881/ -ritesh On 9/6/19 7:26 PM, Ritesh Harjani wrote:
d_is_negative can race with d_instantiate_new() -> __d_set_inode_and_type(). For e.g. in use cases where Thread-1 is creating symlink (doing d_instantiate_new()) & Thread-2 is doing cat of that symlink while doing lookup_fast (via REF-walk- one such case is, when ->permission returns -ECHILD). During this race if __d_set_and_inode_type() does out-of-order execution and set the dentry->d_flags before setting dentry->inode, then it can result into following kernel panic. This change fixes the issue by directly checking for inode. E.g. kernel panic, since inode was NULL. trailing_symlink() -> may_follow_link() -> inode->i_uid. Issue signature:- [NIP : trailing_symlink+80] [LR : trailing_symlink+1092] #4 [c00000198069bb70] trailing_symlink at c0000000004bae60 (unreliable) #5 [c00000198069bc00] path_openat at c0000000004bdd14 #6 [c00000198069bc90] do_filp_open at c0000000004c0274 #7 [c00000198069bdb0] do_sys_open at c00000000049b248 #8 [c00000198069be30] system_call at c00000000000b388 Sequence of events:- Thread-2(Comm: ln) Thread-1(Comm: cat) dentry = __d_lookup() //nonRCU __d_set_and_inode_type() (Out-of-order execution) flags = READ_ONCE(dentry->d_flags); flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); flags |= type_flags; WRITE_ONCE(dentry->d_flags, flags); if (unlikely(d_is_negative()) // fails {} // since d_flags is already updated in // Thread-2 in parallel but inode // not yet set. // d_is_negative returns false *inode = d_backing_inode(path->dentry); // means inode is still NULL dentry->d_inode = inode; trailing_symlink() may_follow_link() inode = nd->link_inode; // nd->link_inode = NULL //Then it crashes while //doing inode->i_uid Reported-by: Guang Yuan Wu <wugyuan@xxxxxxxxxx> Tested-by: Guang Yuan Wu <wugyuan@xxxxxxxxxx> Signed-off-by: Ritesh Harjani <riteshh@xxxxxxxxxxxxx> --- fs/namei.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index 209c51a5226c..b5867fe988e0 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1623,7 +1623,21 @@ static int lookup_fast(struct nameidata *nd, dput(dentry); return status; } - if (unlikely(d_is_negative(dentry))) { + + /* + * Caution: d_is_negative() can race with + * __d_set_inode_and_type(). + * For e.g. in use cases where Thread-1 is creating + * symlink (doing d_instantiate_new()) & Thread-2 is doing + * cat of that symlink and falling here (via Ref-walk) while + * doing lookup_fast (one such case is when ->permission + * returns -ECHILD). + * Now if __d_set_inode_and_type() does out-of-order execution + * i.e. it first sets the dentry->d_flags & then dentry->inode + * then it can result into inode being NULL, causing panic later. + * Hence directly check if inode is NULL here. + */ + if (unlikely(d_really_is_negative(dentry))) { dput(dentry); return -ENOENT; }