[NFS folks Cc'd] On Tue, Oct 31, 2023 at 12:18:48AM +0000, Al Viro wrote: > On Mon, Oct 30, 2023 at 12:18:28PM -1000, Linus Torvalds wrote: > > On Mon, 30 Oct 2023 at 11:53, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > After fixing a couple of brainos, it seems to work. > > > > This all makes me unnaturally nervous, probably because it;s overly > > subtle, and I have lost the context for some of the rules. > > A bit of context: I started to look at the possibility of refcount overflows. > Writing the current rules for dentry refcounting and lifetime down was the > obvious first step, and that immediately turned into an awful mess. > > It is overly subtle. Even more so when you throw the shrink lists into > the mix - shrink_lock_dentry() got too smart for its own good, and that > leads to really awful correctness proofs. ... and for another example of subtle shit, consider DCACHE_NORCU. Recall c0eb027e5aef "vfs: don't do RCU lookup of empty pathnames" and note that it relies upon never getting results of alloc_file_pseudo() with directory inode anywhere near descriptor tables. Back then I basically went "fine, nobody would ever use alloc_file_pseudo() for that anyway", but... there's a call in __nfs42_ssc_open() that doesn't have any obvious protection against ending up with directory inode. That does not end up anywhere near descriptor tables, as far as I can tell, fortunately. Unfortunately, it is quite capable of fucking the things up in different ways, even if it's promptly closed. d_instantiate() on directory inode is a really bad thing; a plenty of places expect to have only one alias for those, and would be very unhappy with that kind of crap without any RCU considerations. I'm pretty sure that this NFS code really does not want to use that for directories; the simplest solution would be to refuse alloc_file_pseudo() for directory inodes. NFS folks - do you have a problem with the following patch? ====== Make sure we never feed a directory to alloc_file_pseudo() That would be broken in a lot of ways, from UAF in pathwalk if that thing ever gets into descriptor tables, to royally screwing every place that relies upon the lack of aliases for directory inodes (i.e. quite a bit of VFS). Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> --- diff --git a/fs/file_table.c b/fs/file_table.c index ee21b3da9d08..5331a696896e 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -326,6 +326,9 @@ struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt, struct path path; struct file *file; + if (WARN_ON_ONCE(S_ISDIR(inode->i_mode))) + return ERR_PTR(-EISDIR); + path.dentry = d_alloc_pseudo(mnt->mnt_sb, &this); if (!path.dentry) return ERR_PTR(-ENOMEM);