Thanks a lot for all your reviews, Al! On 10/28/23 05:03, Al Viro wrote: > On Mon, Oct 23, 2023 at 08:30:29PM +0200, Bernd Schubert wrote: >> +{ >> + int err; >> + struct inode *inode; >> + FUSE_ARGS(args); >> + struct fuse_mount *fm = get_fuse_mount(dir); >> + struct fuse_conn *fc = fm->fc; >> + struct fuse_forget_link *forget; >> + struct fuse_create_in inarg; >> + struct fuse_open_out outopen; >> + struct fuse_entry_out outentry; >> + struct fuse_inode *fi; >> + struct fuse_file *ff; >> + struct dentry *switched_entry = NULL, *alias = NULL; >> + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); >> + >> + /* Expect a negative dentry */ >> + if (unlikely(d_inode(entry))) >> + goto fallback; >> + >> + /* Userspace expects S_IFREG in create mode */ >> + if ((flags & O_CREAT) && (mode & S_IFMT) != S_IFREG) >> + goto fallback; > > How could it get there with such mode? We could check that > in fs/namei.c:atomic_open() (with > if (WARN_ON_ONCE((open_flags & O_CREAT) && !S_ISREG(mode))) > error = -EINVAL; // for the lack of EWTFAREYOUSMOKING > else > error = dir->i_op->atomic_open(....) > or something similar), but that doesn't belong in the method instances. > And it really should never happen - that thing comes from op->mode and > we have build_open_flags() doing this: > if (WILL_CREATE(flags)) { > if (how->mode & ~S_IALLUGO) > return -EINVAL; > op->mode = how->mode | S_IFREG; > } else { > if (how->mode != 0) > return -EINVAL; > op->mode = 0; > } > so... Are other instances doing the same kind of paranoia? That BUG_ON() > in current fuse_atomic_open() is bogus (and seriously misplaced)... Ok, sorry, we took over the check blindly. I added another patch in the v11 branch to remove the BUG_ON in current fuse_atomic_open. > >> + forget = fuse_alloc_forget(); >> + err = -ENOMEM; >> + if (!forget) >> + goto out_err; >> + >> + err = -ENOMEM; >> + ff = fuse_file_alloc(fm); >> + if (!ff) >> + goto out_put_forget_req; >> + >> + if (!fc->dont_mask) >> + mode &= ~current_umask(); >> + >> + flags &= ~O_NOCTTY; >> + memset(&inarg, 0, sizeof(inarg)); >> + memset(&outentry, 0, sizeof(outentry)); >> + inarg.flags = flags; >> + inarg.mode = mode; >> + inarg.umask = current_umask(); >> + >> + if (fc->handle_killpriv_v2 && (flags & O_TRUNC) && >> + !(flags & O_EXCL) && !capable(CAP_FSETID)) { >> + inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID; >> + } >> + >> + args.opcode = FUSE_OPEN_ATOMIC; >> + args.nodeid = get_node_id(dir); >> + args.in_numargs = 2; >> + args.in_args[0].size = sizeof(inarg); >> + args.in_args[0].value = &inarg; >> + args.in_args[1].size = entry->d_name.len + 1; >> + args.in_args[1].value = entry->d_name.name; >> + args.out_numargs = 2; >> + args.out_args[0].size = sizeof(outentry); >> + args.out_args[0].value = &outentry; >> + args.out_args[1].size = sizeof(outopen); >> + args.out_args[1].value = &outopen; >> + >> + if (flags & O_CREAT) { >> + err = get_create_ext(&args, dir, entry, mode); >> + if (err) >> + goto out_free_ff; >> + } >> + >> + err = fuse_simple_request(fm, &args); >> + free_ext_value(&args); >> + if (err == -ENOSYS || err == -ELOOP) { >> + if (unlikely(err == -ENOSYS)) >> + fc->no_open_atomic = 1; >> + goto free_and_fallback; >> + } >> + >> + if (!err && !outentry.nodeid) >> + err = -ENOENT; >> + >> + if (err) >> + goto out_free_ff; >> + >> + err = -EIO; >> + if (invalid_nodeid(outentry.nodeid) || fuse_invalid_attr(&outentry.attr)) >> + goto out_free_ff; >> + >> + ff->fh = outopen.fh; >> + ff->nodeid = outentry.nodeid; >> + ff->open_flags = outopen.open_flags; >> + inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation, >> + &outentry.attr, ATTR_TIMEOUT(&outentry), 0); >> + if (!inode) { >> + flags &= ~(O_CREAT | O_EXCL | O_TRUNC); >> + fuse_sync_release(NULL, ff, flags); >> + fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1); >> + err = -ENOMEM; >> + goto out_err; >> + } >> + >> + /* prevent racing/parallel lookup on a negative hashed */ >> + if (!(flags & O_CREAT) && !d_in_lookup(entry)) { > > ... in which case it has just passed ->d_revalidate()... With the follow up patches that revalidate is going to be moved to this function. Is there anything here that needs to be improved? I had added that check after looking through the other atomic_open methods and then noticed your commit c94c09535c4d: nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed > >> + d_drop(entry); >> + switched_entry = d_alloc_parallel(entry->d_parent, >> + &entry->d_name, &wq); >> + if (IS_ERR(switched_entry)) { >> + err = PTR_ERR(switched_entry); >> + switched_entry = NULL; >> + goto out_free_ff; > > leaked inode? Yikes, silly me and with that also leaked fuse userspace inode. > >> + } >> + >> + if (unlikely(!d_in_lookup(switched_entry))) { >> + /* fall back */ >> + dput(switched_entry); >> + switched_entry = NULL; >> + goto free_and_fallback; > > ditto, and I don't really understand what the hell is going on with > dentry references here. What is the intended behaviour in that case? The idea was to give up 'switched_entry' and let the existing fuse_create_open do the fallback work. Hmm, yeah, already called d_drop(dentry). And it also already did the userspace part - fallback without fuse-forget is totally wrong. I guess I need severe patch update because of this - the other parts are not difficult, but here it gets complex. > >> + } >> + >> + entry = switched_entry; >> + } >> + >> + if (d_really_is_negative(entry)) { >> + d_drop(entry); >> + alias = d_exact_alias(entry, inode); > > What case is that about? "We have an unhashed positive dentry with that > exact name, parent and inode"? Where would it have come from? Sorry, taken from _nfs4_open_and_get_state and I wasn't sure if it is needed. A bit lame excuse, but NFS is the only other file system I found that handles !O_CREAT. I definitely should have marked it with something like /* XXX is this really needed, from _nfs4_open_and_get_state */ > > Another thing: this does not consume an inode reference, no matter what > gets returned, > >> + if (!alias) { >> + alias = d_splice_alias(inode, entry); > > but that one *does* consume the inode reference; note the igrab() in > nfs4 code where you've nicked that from... > >> + if (IS_ERR(alias)) { >> + /* >> + * Close the file in user space, but do not unlink it, >> + * if it was created - with network file systems other >> + * clients might have already accessed it. >> + */ >> + fi = get_fuse_inode(inode); > > ... so this is asking for UAF. Yeah, and staring at it again, the below fuse_queue_forget is not right either, as that is already handled through the inode reference / ->evict_inode. > >> + fuse_sync_release(fi, ff, flags); >> + fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1); >> + err = PTR_ERR(alias); >> + goto out_err; >> + } >> + } >> + >> + if (alias) >> + entry = alias; >> + } > > ... and here we have no way to tell if inode needs to be dropped. I guess I could solve this with another variable, but maybe there is a more beautiful way. I first need to think about the !d_in_lookup(switched_entry). Thanks again so much for your reviews, Bernd