Re: [PATCH 4/5] fs: honor LOOKUP_NONBLOCK for the last part of file open

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/13/20 5:45 PM, Linus Torvalds wrote:
> On Sun, Dec 13, 2020 at 2:50 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>>>
>>>> Only O_CREAT | O_TRUNC should matter, since those are the ones that
>>>> cause writes as part of the *open*.
>>
>> And __O_TMPFILE, which is the same as O_CREAT.
> 
> This made me go look at the code, but we seem to be ok here -
> __O_TMPFILE should never get to the do_open() logic at all, because it
> gets caught before that and does off to do_tmpfile() and then
> vfs_tmpfile() instead.
> 
> And then it's up to the filesystem to do the inode locking if it needs
> to - it has a separate i_io->tempfile function for that.
> 
> From a LOOKUP_NONBLOCK standpoint, I think we should just disallow
> O_TMPFILE the same way Jens disallowed O_TRUNCATE.
> 
> Otherwise we'd have to teach filesystems about it.

Good point, let's avoid that for now... More below.

> Which might be an option long-term of course, but I don't think it
> makes sense for any initial patch-set: the real "we can do this with
> no locks and quickly" case is just opening an existing file entirely
> using the dcache.
> 
> Creating new files isn't exactly _uncommon_, of course, but it's
> probably still two orders of magnitude less common than just opening
> an existing file. Wild handwaving.

Obviously depends on the workload, but there are plenty of interesting
workloads that are not create intensive at all and just want a fast way
to open an existing file.

> So I suspect O_CREAT simply isn't really interesting either. Sure, the
> "it already exists" case could potentially be done without any locking
> or anything like that, but again, I don't think that case is worth
> optimizing for: O_CREAT probably just isn't common enough.
> 
> [ I don't have tons of hard data to back that handwaving argument
> based on my gut feel up, but I did do a trace of a kernel build just
> because that's my "default load" and out of 250 thousand openat()
> calls, only 560 had O_CREAT and/or O_TRUNC. But while that's _my_
> default load, it's obviously not necessarily very representative of
> anything else. The "almost three orders of magnitude more regular
> opens" doesn't _surprise_ me though ]
> 
> So I really think the right model is not to worry about trylock for
> the inode lock at all, but to simply just always fail with EAGAIN in
> that case - and not do LOOKUP_NONBLOCK for O_CREAT at all.
> 
> The normal fast-path case in open_last_lookups() is the "goto
> finish_lookup" case in this sequence:
> 
>         if (!(open_flag & O_CREAT)) {
>                 if (nd->last.name[nd->last.len])
>                         nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
>                 /* we _can_ be in RCU mode here */
>                 dentry = lookup_fast(nd, &inode, &seq);
>                 if (IS_ERR(dentry))
>                         return ERR_CAST(dentry);
>                 if (likely(dentry))
>                         goto finish_lookup;
> 
> and we never get to the inode locking sequence at all.
> 
> So just adding a
> 
>         if (nd->flags & LOOKUP_NONBLOCK)
>                 return -EAGAIN;
> 
> there is probably the right thing - rather than worry about trylock on
> the inode lock.

I really like that, as it also means we can safely drop the
mnt_want_write() path and just safe that for a separate cleanup that's
not related to this at all. With that, I'd suggest folding the two
patches into one as we no longer need followup work. It also means we
can drop the concept of having two flags for this, LOOKUP_NONBLOCK
covers the entire case of the existing fast path.

Might still make sense to name it LOOKUP_CACHE instead, but I'll leave
that for you/Al to decide...

> Because if you've missed in the dcache, you've by definition lost the fast-path.
> 
> That said, numbers talk, BS walks, and maybe somebody has better loads
> with better numbers. I assume Jens has some internal FB io_uring loads
> and could do stats on what kinds of pathname opens matter there...

Definitely, was already planning on checking that. And just as
importantly, once we have this, it's pretty trivial to do the same thing
on the stat side as well, which makes me excited for that change too.
io_uring currently punts that one too, and that's arguably even more
interesting than open if we can avoid that for the fast case.

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux