On Wed, Feb 02, 2022 at 08:29:42PM -0800, Jeremy Allison wrote: > On Thu, Feb 03, 2022 at 04:02:45AM +0000, Al Viro wrote: > > On Thu, Feb 03, 2022 at 08:16:21AM +0900, Namjae Jeon wrote: > > > > > > Why is so much tied to "open, then figure out where it is" model? > > > > Is it a legacy of userland implementation, or a network fs protocol that > > > > manages to outsuck NFS, or...? > > > It need to use absolute based path given from request. > > > > Er... yes? Normal syscalls also have to be able to deal with pathnames; > > the sane way for e.g. unlink() is to traverse everything except the last > > component, then do inode_lock() on the directory you've arrived at, do > > lookup for the final component and do the operation. > > > > What we do not attempt is "walk the entire path, then try to lock the > > parent of whatever we'd arrived at, then do operation". Otherwise > > we'd have the same kind of headache in all directory-manipulating > > syscalls... > > > > What's the problem with doing the same thing here? Lack of convenient > > exported helpers for some of those? Then let's sort _that_ out... > > If there's something else, I'd like to know what exactly it is. > > Samba recently did a complete VFS rewrite to remove almost > *all* pathname-based calls for exactly this reason (remove > all possibility of symlink races). > > https://www.samba.org/samba/security/CVE-2021-20316.html > > Is this essentially the same bug affecting ksmbd here ? It's not about symlinks; it's about rename racing with pathname resolution and tearing the object you've looked up away from the directory you are preparing to modify. ksmbd does pathwalk + attempt to lock the parent + check if we had been moved away while we'd been grabbing the lock. It *can* be made to work, but it's bloody painful - you need to grab a reference to parent (take a look at the games dget_parent() has to do, and that one has a luxury of not needing to grab a blocking lock), *then* you need to lock it, check that our object is still its child (child->d_parent might change, but its comparison with dentry of locked directory is stable), then check that child is still hashed and either proceed with the operation, or unlock the directory you'd locked, drop the reference to it and repeat the entire dance starting at dget_parent(). It's doable, but really unpleasant. A much simpler approach is to find the parent as we are looking for child, lock it and then look for child in an already locked directory. Which guarantees the stability of name and parent of whatever you find there, for as long as directory remains locked. None of such retry loops are needed (and dget_parent() *is* that internally), no need to bother about unexpected errors, etc.