Re: [PATCH 4/5] fs: honor LOOKUP_NONBLOCK for the last part of file open

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/12/20 11:57 AM, Linus Torvalds wrote:
> On Sat, Dec 12, 2020 at 8:51 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>> We handle it for the path resolution itself, but we should also factor
>> it in for open_last_lookups() and tmpfile open.
> 
> So I think this one is fundamentally wrong, for two reasons.
> 
> One is that "nonblock" shouldn't necessarily mean "take no locks at
> all". That directory inode lock is very very different from "go down
> to the filesystem to do IO". No other NONBLOCK thing has ever been "no
> locks at all", they have all been about possibly long-term blocking.

Do we ever do long term IO _while_ holding the direcoty inode lock? If
we don't, then we can probably just ignore that side alltogether.

> The other this is that the mnt_want_write() check really smells no
> different at all from any of the "some ->open functions might block".
> Which you don't handle, and which again is entirely different from the
> pathname resolution itself blocking.
> 
> So I don't think either of these cases are about LOOKUP_NONBLOCK, the
> same way they aren't about LOOKUP_RCU.
> 
> The  mnt_want_write() in particular is much more about the kinds of
> things we already check for in do_open().
> 
> In fact, looking at this patch, I think mnt_want_write() itself is
> actually conceptually buggy, although it doesn't really matter: think
> about device nodes etc. Opening a device node for writing doesn't
> write to the filesystem that the device node is on.
> 
> Why does that code care about O_WRONLY | O_RDWR? That has *nothing* to
> do with the open() wanting to write to the filesystem. We don't even
> hold that lock after the open - we'll always drop it even for a
> successful open.
> 
> Only O_CREAT | O_TRUNC should matter, since those are the ones that
> cause writes as part of the *open*.
> 
> Again - I don't think that this is really a problem. I mention it more
> as a "this shows how the code is _conceptually_ wrong", and how it's
> not really about the pathname resolution any more.
> 
> In fact, I think that how we pass on that "got_write" to lookup_open()
> is just another sign of how we do that mnt_want_write() in the wrong
> place and at trhe wrong level. We shouldn't have been doing that
> mnt_want_write() for writable oipens (that's a different thing), and
> looking at lookup_open(), I think we should have waited with doing it
> at all until we've checked if we even need it (ie O_CREAT on a file
> that already exists does *not* need the mnt_want_write() at all, and
> we'll see that when we get to that
> 
>         /* Negative dentry, just create the file */
>         if (!dentry->d_inode && (open_flag & O_CREAT)) {
> 
> thing.
> 
> So I think this patch actually shows that we do things wrong in this
> area, and I think the kinds of things it does are questionable as a
> result. In particular, I'm not convinced that the directory semaphore
> thing should be tied to LOOKUP_NONBLOCK, and I don't think the
> mnt_want_write() logic is right at all.
> 
> The first one would be fixed by a separate flag.
> 
> The second one I think is more about "we are doing mnt_want_write() at
> the wrong point, and if we move mnt_want_write() to the right place
> we'd just fail it explicitly for the LOOKUP_NONBLOCK case" - the same
> way a truncate should just be an explicit fail, not a "trylock"
> failure.

I'm going to let Al comment on the mnt_want_write() logic. It did feel
iffy to me, and the error handling (and passing of it) around makes it
hard to reason about. Would likely make this change more straight
forward if we sort out that first.

I do agree that we should keep the two things separate - and potentially
just have the RESOLVE_NONBLOCK be RESOLVE_CACHE (or something like that)
and not deal with the locking / want write side of things at all. For
io_uring, we really do want to ensure that we don't stall the submission
pipeline by potentially being stuck on the directory lock or waiting for
a frozen file system.

And I don't think we can get around having two flags at that point -
probably by renaming the initial LOOKUP_NONBLOCK to LOOKUP_CACHE, and by
having a LOOKUP_NONBLOCK which has a slightly wider scope.

> I also think we need to deal with the O_NONBLOCK kind of situation at
> the actual ->open() time (ie the whole "oh, NFS wants to revalidate
> caches due to open/close consistency, named pipes want to wait for
> pairing, device nodes want to wait for device". Again, I think that's
> separate from LOOKUP_NONBLOCK, though.

Agree, that's still not done in this patch. I did think about it, and
the only potential idea I had around that was to wrap the actual open in
setting/clearing O_NDELAY/O_NONBLOCK around the open. Which _should_
work as long as it happens before fd_install() is done, but doesn't feel
that architecturally clean.

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux