On Thu, Oct 1, 2020 Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@xxxxxxxxxx> wrote: > > > Thanks very much for your help. The patch you provided does solve > > > the problem in the O_CREAT|O_EXCL case (by making a lookup call to > > > re-validate the entry of the since deleted file), but not in the > > > O_CREAT case. (In that case the kernel still winds up making a FUSE > > > open request rather than a FUSE create request.) I'd like to > > > suggest the slightly different attached patch instead, which > > > triggers re-validation in both cases. > Which is a problem, because that makes O_CREAT on existing files (a > fairly common case) add a new synchronous request, possibly > resulting in a performance regression. > I don't see an easy way this can be fixed, and I'm not sure this > needs to be fixed. > Are you seeing a real issue with just O_CREAT? Yes, we definitely do see issues with just O_CREAT. The specific sequence that involves O_CREAT without O_EXCL is: 1. A file exists and is accessed through our FUSE distributed filesystem on host X. The kernel on host X caches the directory entry for the file. 2. The file is unlinked through our FUSE distributed filesystem on host Y. 3. An open(2) call with O_CREAT for the file occurs on host X. Because the kernel has a cached dentry for the now deleted file, it makes a FUSE open request to our filesystem (rather than a FUSE create request). 4. Our filesystem's open handler finds that the file does not exist (because it was unlinked in step 2) and replies to the open request with ENOENT. (The FUSE open handler cannot tell that O_CREAT was specified in the flags of the syscall as that bit is not passed through in the flags in the FUSE open request, so it can't automatically handle this case by creating the file.) 5. The kernel passes the ENOENT error code on as the result of the open(2) system call, so the file is not created and not opened. To me this seems clearly incorrect in terms of observable behavior. The file does not exist at the point of the open(2) syscall with O_CREAT in step 3 (although the kernel on host X has not become aware of its deletion). The file should be created and opened. An open(2) syscall with O_CREAT shouldn't fail with ENOENT because the file does not exist (which is what happens in this situation currently). I don't see how to avoid this without some kernel-level change with acceptable performance. (We could make the unlink on host Y synchronously perform an entry invalidation across all other hosts where our FUSE daemon is running, but that would be a huge performance problem and a significant addition of complexity in our FUSE daemon.) I believe that there are at least two other ways to resolve this without adding the synchronous lookup request on every open syscall with O_CREAT: - Preserve the O_CREAT bit in the flags passed through in the FUSE open request. (I believe the place where this bit is maked out is in fuse_send_open in fs/fuse/file.c.) That would allow the FUSE open handler to know that file creation was requested and to perform the creation in this case. (I'll mention that this would be similar to a behavior we've implemented in our FUSE create handler to open an existing file when O_EXCL is not in the flags, which allows handling cases where a file was recently created on a remote host.) - If a FUSE open call fails with ENOENT when O_CREAT is used, have the kernel drop the cached dentry and then make a FUSE create request. Thanks. --Ken Schalk