Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Tue, Feb 27, 2018 at 06:16:28AM +0100, John Ogness wrote: > >> If someone else has grabbed a reference, it shouldn't be added to the >> lru list. Only decremented. >> >> if (entry->d_lockref.count == 1) > > Nah, better handle that in retain_dentry() itself. See updated > #work.dcache. > > Note: another potentially fun thing in that branch is that I've > finally decided to bite the bullet and make __d_move() preserve > ->d_parent of target. > > Mainline: > al@sonny:/tmp$ touch d > al@sonny:/tmp$ sleep 100 >/tmp/a/b/c & > [1] 16487 > al@sonny:/tmp$ ls -l /proc/16487/fd > total 0 > lrwx------ 1 al al 64 Mar 12 11:33 0 -> /dev/pts/13 > l-wx------ 1 al al 64 Mar 12 11:33 1 -> /tmp/a/b/c > lrwx------ 1 al al 64 Mar 12 11:33 2 -> /dev/pts/13 > al@sonny:/tmp$ mv /tmp/d /tmp/a/b/c > al@sonny:/tmp$ ls -l /proc/16487/fd > total 0 > lrwx------ 1 al al 64 Mar 12 11:33 0 -> /dev/pts/13 > l-wx------ 1 al al 64 Mar 12 11:33 1 -> /tmp/c (deleted) > lrwx------ 1 al al 64 Mar 12 11:33 2 -> /dev/pts/13 > > With that branch: > root@kvm1:/tmp# touch d > root@kvm1:/tmp# sleep 100 >/tmp/a/b/c & > [1] 2263 > root@kvm1:/tmp# ls -l /proc/2263/fd > total 0 > lrwx------ 1 root root 64 Mar 12 11:33 0 -> /dev/pts/0 > l-wx------ 1 root root 64 Mar 12 11:33 1 -> /tmp/a/b/c > lrwx------ 1 root root 64 Mar 12 11:33 2 -> /dev/pts/0 > root@kvm1:/tmp# mv /tmp/d /tmp/a/b/c > root@kvm1:/tmp# ls -l /proc/2263/fd > total 0 > lrwx------ 1 root root 64 Mar 12 11:33 0 -> /dev/pts/0 > l-wx------ 1 root root 64 Mar 12 11:33 1 -> '/tmp/a/b/c (deleted)' > lrwx------ 1 root root 64 Mar 12 11:33 2 -> /dev/pts/0 > > It doesn't come quite for free; cross-directory d_move() > and d_exchange() callers are responsible for having both > parents pinned (all of them do that, mostly since the usual > sequence is "look parents up, lock_rename(), *then* look > children up, then do renaming"; those that are not part of > rename(2) are also OK) and d_splice_alias() has become potentially > blocking in one case. AFAICS, none of the callers is in > locking environment that would not allow that. Survives > the local beating and doesn't seem to cause any performance > regressions. > > What we get out of that is > a) much saner semantics for d_move() et.al. > b) saner behaviour of d_path() (see above) > c) dentry can be IS_ROOT only if it has been > such all along; that simplifies the hell out of analysis. > > FWIW, there's another trylock loop on dentries - one in > autofs get_next_positive_dentry(). Any plans re dealing > with that one? > > I'd spent the last couple of weeks (when not being too sick > for any work) going through dcache.c and related code; hopefully > this time I will get the documentation into postable shape ;-/ > > There's an unpleasant area around the ->s_root vs. NFS. There's > code that makes assumptions about ->s_root that are simply not true > for NFS. Is path_connected() correct wrt NFS multiple imports from > the same server? Ditto for mnt_already_visible() (that one might > be mitigated at the moment, but probably won't last). Eric, am > I missing something subtle in there? I don't have the entire context in my head. But I don't think we have problems today. NFS before it uses paths from an unconnected root in the rest of the vfs walks those paths backwards and makes the paths connect. I don't remember where all of that code that performs those connections but I do remember the code in fs/fhandle.c shares that code with nfs, to perform the same operation in open_by_handle_at. So I don't think the nfs peculiarities are actually relevant to anything on an ordinary code path. Of the two code paths you are concert about: For path path_connected looking at s_root is a heuristic to avoid calling is_subdir every time we need to do that check. If the heuristic fails we still have is_subdir which should remain accurate. If is_subdir fails the path is genuinely not connected at that moment and failing is the correct thing to do. For mnt_too_revealing the only filesystems under consideration are proc and sysfs. So nfs oddities are of no consequence. mnt_too_revealing probably won't be extended to other filesystems. Certainly nfs is not a candidate for having setting SB_I_USERNS_VISIBLE. Al is that sufficient to address your concerns? Eric