2022-01-31 10:57 GMT+09:00, Al Viro <viro@xxxxxxxxxxxxxxxxxx>: > Folks, ->d_name and ->d_parent are *NOT* stable unless the > appropriate locks are held. In particular, locking a directory that > might not be our parent is obviously not going to prevent anything. > Even if it had been our parent at some earlier point. Hi Al, First, Thanks for pointing that out! > > ->d_lock would suffice, but it can't be held over blocking > operation and it can't be held over dcache lookups anyway (instant > deadlocks). IOW, the following is racy: > > int ksmbd_vfs_lock_parent(struct user_namespace *user_ns, struct dentry > *parent, > struct dentry *child) > { > struct dentry *dentry; > int ret = 0; > > inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); > dentry = lookup_one(user_ns, child->d_name.name, parent, > child->d_name.len); > if (IS_ERR(dentry)) { > ret = PTR_ERR(dentry); > goto out_err; > } > > if (dentry != child) { > ret = -ESTALE; > dput(dentry); > goto out_err; > } > > dput(dentry); > return 0; > out_err: > inode_unlock(d_inode(parent)); > return ret; > } > > > Some of that might be fixable - verifying that ->d_parent points > to parent immediately after inode_lock would stabilize ->d_name in case > of match. However, a quick look through the callers shows e.g. this: > write_lock(&ci->m_lock); > if (ci->m_flags & (S_DEL_ON_CLS | S_DEL_PENDING)) { > dentry = filp->f_path.dentry; > dir = dentry->d_parent; > ci->m_flags &= ~(S_DEL_ON_CLS | S_DEL_PENDING); > write_unlock(&ci->m_lock); > ksmbd_vfs_unlink(file_mnt_user_ns(filp), dir, > dentry); > write_lock(&ci->m_lock); > } > write_unlock(&ci->m_lock); > > What's to keep dir from getting freed right under us, just as > ksmbd_vfs_lock_parent() (from ksmbd_vfs_unlink()) tries to grab ->i_rwsem > on its inode? Right. We need to get parent using dget_parent(). > > Have the file moved to other directory and apply memory pressure. > What's to prevent dir from being evicted, its memory recycled, etc.? Let me check it. > > For another fun example, consider e.g. smb2_rename(): > if (file_present && > strncmp(old_name, path.dentry->d_name.name, > strlen(old_name))) { > rc = -EEXIST; > ksmbd_debug(SMB, > "cannot rename already existing > file\n"); > goto out; > } > > Suppose path.dentry has a name longer than 32 bytes (i.e. too large to > fit into ->d_iname and thus allocated separately). At this point you > are not holding any locks (otherwise ksmbd_vfs_fp_rename() immediately > downstream would deadlock). So what's to prevent rename(2) on host > ending up with path.dentry getting renamed and old name getting freed? > > More of the same: ksmbd_vfs_fp_rename(). In this one > dget_parent() will at least avoid parent getting freed. It won't do > a damn thing to stabilize src_dent->d_name after lock_rename(), > though, since we are not guaranteed that the thing we locked is > still the parent... Okay, I will check it. > > Why is so much tied to "open, then figure out where it is" model? > Is it a legacy of userland implementation, or a network fs protocol that > manages to outsuck NFS, or...? It need to use absolute based path given from request. Thanks! >