On Sun, Dec 13, 2020 at 2:50 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > Only O_CREAT | O_TRUNC should matter, since those are the ones that > > > cause writes as part of the *open*. > > And __O_TMPFILE, which is the same as O_CREAT. This made me go look at the code, but we seem to be ok here - __O_TMPFILE should never get to the do_open() logic at all, because it gets caught before that and does off to do_tmpfile() and then vfs_tmpfile() instead. And then it's up to the filesystem to do the inode locking if it needs to - it has a separate i_io->tempfile function for that. >From a LOOKUP_NONBLOCK standpoint, I think we should just disallow O_TMPFILE the same way Jens disallowed O_TRUNCATE. Otherwise we'd have to teach filesystems about it. Which might be an option long-term of course, but I don't think it makes sense for any initial patch-set: the real "we can do this with no locks and quickly" case is just opening an existing file entirely using the dcache. Creating new files isn't exactly _uncommon_, of course, but it's probably still two orders of magnitude less common than just opening an existing file. Wild handwaving. So I suspect O_CREAT simply isn't really interesting either. Sure, the "it already exists" case could potentially be done without any locking or anything like that, but again, I don't think that case is worth optimizing for: O_CREAT probably just isn't common enough. [ I don't have tons of hard data to back that handwaving argument based on my gut feel up, but I did do a trace of a kernel build just because that's my "default load" and out of 250 thousand openat() calls, only 560 had O_CREAT and/or O_TRUNC. But while that's _my_ default load, it's obviously not necessarily very representative of anything else. The "almost three orders of magnitude more regular opens" doesn't _surprise_ me though ] So I really think the right model is not to worry about trylock for the inode lock at all, but to simply just always fail with EAGAIN in that case - and not do LOOKUP_NONBLOCK for O_CREAT at all. The normal fast-path case in open_last_lookups() is the "goto finish_lookup" case in this sequence: if (!(open_flag & O_CREAT)) { if (nd->last.name[nd->last.len]) nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY; /* we _can_ be in RCU mode here */ dentry = lookup_fast(nd, &inode, &seq); if (IS_ERR(dentry)) return ERR_CAST(dentry); if (likely(dentry)) goto finish_lookup; and we never get to the inode locking sequence at all. So just adding a if (nd->flags & LOOKUP_NONBLOCK) return -EAGAIN; there is probably the right thing - rather than worry about trylock on the inode lock. Because if you've missed in the dcache, you've by definition lost the fast-path. That said, numbers talk, BS walks, and maybe somebody has better loads with better numbers. I assume Jens has some internal FB io_uring loads and could do stats on what kinds of pathname opens matter there... Linus