RE: [fuse-devel] Cross-host entry caching and file open/create

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 1, 2020 Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@xxxxxxxxxx> wrote:
> > > Thanks very much for your help.  The patch you provided does solve 
> > > the problem in the O_CREAT|O_EXCL case (by making a lookup call to 
> > > re-validate the entry of the since deleted file), but not in the 
> > > O_CREAT case.  (In that case the kernel still winds up making a FUSE 
> > > open request rather than a FUSE create request.)  I'd like to 
> > > suggest the slightly different attached patch instead, which 
> > > triggers re-validation in both cases.

> Which is a problem, because that makes O_CREAT on existing files (a
> fairly common case) add a new synchronous request, possibly
> resulting in a performance regression.

> I don't see an easy way this can be fixed, and I'm not sure this
> needs to be fixed.

> Are you seeing a real issue with just O_CREAT?

Yes, we definitely do see issues with just O_CREAT.  The specific
sequence that involves O_CREAT without O_EXCL is:

1. A file exists and is accessed through our FUSE distributed
   filesystem on host X.  The kernel on host X caches the directory
   entry for the file.

2. The file is unlinked through our FUSE distributed filesystem on
   host Y.

3. An open(2) call with O_CREAT for the file occurs on host X.
   Because the kernel has a cached dentry for the now deleted file, it
   makes a FUSE open request to our filesystem (rather than a FUSE
   create request).

4. Our filesystem's open handler finds that the file does not exist
   (because it was unlinked in step 2) and replies to the open request
   with ENOENT.  (The FUSE open handler cannot tell that O_CREAT was
   specified in the flags of the syscall as that bit is not passed
   through in the flags in the FUSE open request, so it can't
   automatically handle this case by creating the file.)

5. The kernel passes the ENOENT error code on as the result of the
   open(2) system call, so the file is not created and not opened.

To me this seems clearly incorrect in terms of observable behavior.
The file does not exist at the point of the open(2) syscall with
O_CREAT in step 3 (although the kernel on host X has not become aware
of its deletion).  The file should be created and opened.  An open(2)
syscall with O_CREAT shouldn't fail with ENOENT because the file does
not exist (which is what happens in this situation currently).

I don't see how to avoid this without some kernel-level change with
acceptable performance.  (We could make the unlink on host Y
synchronously perform an entry invalidation across all other hosts
where our FUSE daemon is running, but that would be a huge performance
problem and a significant addition of complexity in our FUSE daemon.)

I believe that there are at least two other ways to resolve this
without adding the synchronous lookup request on every open syscall
with O_CREAT:

- Preserve the O_CREAT bit in the flags passed through in the FUSE
  open request.  (I believe the place where this bit is maked out is
  in fuse_send_open in fs/fuse/file.c.)  That would allow the FUSE
  open handler to know that file creation was requested and to perform
  the creation in this case.  (I'll mention that this would be similar
  to a behavior we've implemented in our FUSE create handler to open
  an existing file when O_EXCL is not in the flags, which allows
  handling cases where a file was recently created on a remote host.)

- If a FUSE open call fails with ENOENT when O_CREAT is used, have the
  kernel drop the cached dentry and then make a FUSE create request.

Thanks.

--Ken Schalk




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux