Re: [PATCH 4/5] fs: honor LOOKUP_NONBLOCK for the last part of file open

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 13, 2020 at 2:50 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > >
> > > Only O_CREAT | O_TRUNC should matter, since those are the ones that
> > > cause writes as part of the *open*.
>
> And __O_TMPFILE, which is the same as O_CREAT.

This made me go look at the code, but we seem to be ok here -
__O_TMPFILE should never get to the do_open() logic at all, because it
gets caught before that and does off to do_tmpfile() and then
vfs_tmpfile() instead.

And then it's up to the filesystem to do the inode locking if it needs
to - it has a separate i_io->tempfile function for that.

>From a LOOKUP_NONBLOCK standpoint, I think we should just disallow
O_TMPFILE the same way Jens disallowed O_TRUNCATE.

Otherwise we'd have to teach filesystems about it.

Which might be an option long-term of course, but I don't think it
makes sense for any initial patch-set: the real "we can do this with
no locks and quickly" case is just opening an existing file entirely
using the dcache.

Creating new files isn't exactly _uncommon_, of course, but it's
probably still two orders of magnitude less common than just opening
an existing file. Wild handwaving.

So I suspect O_CREAT simply isn't really interesting either. Sure, the
"it already exists" case could potentially be done without any locking
or anything like that, but again, I don't think that case is worth
optimizing for: O_CREAT probably just isn't common enough.

[ I don't have tons of hard data to back that handwaving argument
based on my gut feel up, but I did do a trace of a kernel build just
because that's my "default load" and out of 250 thousand openat()
calls, only 560 had O_CREAT and/or O_TRUNC. But while that's _my_
default load, it's obviously not necessarily very representative of
anything else. The "almost three orders of magnitude more regular
opens" doesn't _surprise_ me though ]

So I really think the right model is not to worry about trylock for
the inode lock at all, but to simply just always fail with EAGAIN in
that case - and not do LOOKUP_NONBLOCK for O_CREAT at all.

The normal fast-path case in open_last_lookups() is the "goto
finish_lookup" case in this sequence:

        if (!(open_flag & O_CREAT)) {
                if (nd->last.name[nd->last.len])
                        nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
                /* we _can_ be in RCU mode here */
                dentry = lookup_fast(nd, &inode, &seq);
                if (IS_ERR(dentry))
                        return ERR_CAST(dentry);
                if (likely(dentry))
                        goto finish_lookup;

and we never get to the inode locking sequence at all.

So just adding a

        if (nd->flags & LOOKUP_NONBLOCK)
                return -EAGAIN;

there is probably the right thing - rather than worry about trylock on
the inode lock.

Because if you've missed in the dcache, you've by definition lost the fast-path.

That said, numbers talk, BS walks, and maybe somebody has better loads
with better numbers. I assume Jens has some internal FB io_uring loads
and could do stats on what kinds of pathname opens matter there...

          Linus



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux