On Fri, Sep 26, 2014 at 05:44:42PM +0100, Al Viro wrote: > FWIW, the longer I'm looking at that, the more it seems that __d_move() and > __d_materialise_dentry() ought to be merged. The thing is, both callers of > the latter are followed by the same sequence: > write_sequnlock(&rename_lock); > _d_rehash(anon); > spin_unlock(&anon->d_lock); > (in one case - verbatim, in another - with goto thrown in before _d_rehash()). > Now, there's no good reason for doing that write_sequnlock() first; both > _d_rehash() and spin_unlock() are fast enough to make contention on > rename_lock a non-issue. And if we pull those before write_sequnlock() and > into __d_materialise_dentry(dentry, anon), we get rid not only of code > duplication, but of the "it returns with anon->d_lock held" caution as well, > *and* __d_materialise_dentry() becomes very similar to __d_move(). > > Note that __d_move(x, y) depends on !IS_ROOT(y), while > __d_materialise_dentry(x, y) assumes (and is obviously guaranteed by > callers) IS_ROOT(y). IOW, even leaving aside the "is it an exchange" argument > fed to __d_move(), we could distinguish between these guys by IS_ROOT(y) > alone. And there's a fuckload of duplication between them even now, let alone > after pulling the rehash in. It's actually even better. __d_materialise_dentry() has its arguments in the wrong order. Flip them around and it's very nearly an exact copy of what we do in __d_move() when the first argument is IS_ROOT(). Actually, that case in __d_move() had been added back in 2002 exactly for the needs of d_splice_alias(). And it's been unreachable since this Bruce has switched d_splice_alias() to use of __d_materialise_dentry(). In fact, both d_splice_alias() and d_materialise_unique() should've been switched to __d_move() instead. I've done that massage and removal of __d_materialise_dentry(). The things look a lot saner now that all dentry moving is done in one place, IMO. Not to mention that it trims ~50 lines off fs/dcache.c, driving it down to the 3rd place in fs/*.c ;-) Anyway, I've put the branch with fixes + that stuff in vfs.git#for-linus; I have *NOT* put the "keep names on overwriting rename" horror in there, but I believe that we will need something of that sort. Linus, it's a real userland regression. Yes, it affects only shitty scripts, and they had never been reliable for long names anyway, but we have broken real-world userland code by that. What happened there is that old behaviour of removal-by-rename wrt d_path() used to be similar to that on unlink. cd /tmp; touch foo; touch bar exec 42<bar mv foo bar ls -l /proc/self/fd/42 would give you lr-x------ 1 al al 64 Sep 26 23:44 /proc/self/fd/42 -> /tmp/bar (deleted) Now it gives "/tmp/foo (deleted)". The trouble being, the same goes for /proc/*/exe. If you upgraded a package and that had replaced a running binary, you used to be able to find process still running that binary. After the upgrade. And yes, it was an unreliable kludge and the value of that was in the best case dubious, but there really are people who used to do it. And it broke after we'd added RENAME_EXCHANGE support. The old behaviour of switch_names() in short names case used to be "copy the name/length of target to source dentry, swap hash keys and piss on hash key of target not matching target's name anymore - the sucker is unhashed anyway". For RENAME_EXCHANGE we certainly needed to swap the names in all cases - precisely because both suckers are hashed in the end. And since we had done that in exchange case, it was simpler to do it in all cases. Especially since when either name had been a long one we'd always swapped them, so anything relying on old behaviour had been unreliable anyway. Except that it turns out that old behaviour (keep the last component of victim on normal rename) was relied upon... That ucking fugly patch reverts the non-swapping renames to the old behaviour. I don't like it any more than you do - it *is* ugly as hell and I still can't swear that we don't have a bogus codepath leading to d_rehash() of rename() victim (which would break things very badly with the old behaviour). Still, it is a real-world userland regression. Up to you, but I'm afraid that it's on the "we get to keep supporting that" side of things. Anyway, what I've got in vfs.git#for-linus right now seems to survive the obvious tests so far, still running through full xfstests; please, take a look at that pile. The last one should go with you as author - it was just faster to do the change manually than deal with git-am failure when applying your patch. I'll take the headers from your mail and update that commit before sending the pull request. Shortlog: Al Viro (8): ufs: deal with nfsd/iget races pull rehashing and unlocking the target dentry into __d_materialise_dentry() don't open-code d_rehash() in d_materialise_unique() __d_move(): fold manipulations with ->d_child/->d_subdirs __d_materialise_dentry(): flip the order of arguments kill __d_materialise_dentry() fold unlocking the children into dentry_unlock_parents_for_move() fold swapping ->d_name.hash into switch_names() Miklos Szeredi (2): shmem: fix nlink for rename overwrite directory fuse: honour max_read and max_write in direct_io mode Diffstat: fs/dcache.c | 85 +++++++++++---------------------------------------- fs/direct-io.c | 2 +- fs/fuse/file.c | 1 + fs/ufs/ialloc.c | 6 +++- fs/ufs/namei.c | 4 +++ include/linux/uio.h | 2 +- mm/iov_iter.c | 14 ++++++--- mm/shmem.c | 4 ++- 8 files changed, 42 insertions(+), 76 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html