On Thu, Feb 1, 2024 at 7:53 PM Bernd Schubert <bschubert@xxxxxxx> wrote: > > On 2/1/24 17:33, Amir Goldstein wrote: > > On Thu, Feb 1, 2024 at 4:47 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > >> > >> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@xxxxxxx> wrote: > >>> > >>> From: Amir Goldstein <amir73il@xxxxxxxxx> > >>> > >>> The fuse inode io mode is determined by the mode of its open files/mmaps > >>> and parallel dio. > >>> > >>> - caching io mode - files open in caching mode or mmap on direct_io file > >>> - direct io mode - no files open in caching mode and no files mmaped > >>> - parallel dio mode - direct io mode with parallel dio in progress > >> > >> Specifically if iocachectr is: > >> > >>> 0 -> caching io > >> == 0 -> direct io > >> < 0 -> parallel io > >> > >>> > >>> We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open > >>> in caching mode. > >> > >> This is really confusing. FOPEN_CACHE_IO is apparently an internally > >> used flag, but it's defined on the userspace API. > >> > >> a) what is the meaning of this flag on the external API? > >> b) what is the purpose of this flag internally? > > > > The purpose is to annotate the state of direct io file that was mmaped > > as FOPEN_DIRECT_IO | FOPEN_CACHE_IO. > > An fd like this puts inode in caching mode and its release may get inode > > out of caching mode. > > > > I did not manage to do refcoutning with fuse_vma_close(), because those > > calls are not balances with fuse_file_mmap() calls. > > > > The first mmap() of an FOPEN_DIRECT_IO file may incur wait for completion > > of parallel dio. > > > > The only use of exporting FOPEN_CACHE_IO to the server is that it could > > force incurring this wait at open() time instead of mmap() time. > > > > I also considered for servers that advertise FUSE_PASSTHROUGH > > capability to not allow open without specifying an explicit io mode, > > that is one of FOPEN_DIRECT_IO | FOPEN_CACHE_IO | > > FOPEN_PASSTHROUGH, but I did not actually implement those > > semantics ATM. > > > >> > >>> > >>> direct_io mmap uses page cache, so first mmap will mark the file as > >>> FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter > >>> the caching io mode. > >>> > >>> If the server opens the file with flags FOPEN_DIRECT_IO|FOPEN_CACHE_IO, > >>> the inode enters caching io mode already on open. > >>> > >>> This allows executing parallel dio when inode is not in caching mode > >>> even if shared mmap is allowed, but no mmaps have been performed on > >>> the inode in question. > >>> > >>> An open in caching mode and mmap on direct_io file now waits for all > >>> in-progress parallel dio writes to complete, so paralle dio writes > >>> together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit. > >> > >> I think the per file state is wrong, the above can be done with just > >> the per-inode state. > >> > > > > The per-file state is to taint the file is "has been mmaped". > > I did not find another way of doing this. > > Please suggest another way. > > > >> > >>> > >>> Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> > >>> Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > >>> --- > >>> fs/fuse/file.c | 215 ++++++++++++++++++++++++++++++++++++-- > >>> fs/fuse/fuse_i.h | 79 +++++++++++++- > >>> include/uapi/linux/fuse.h | 2 + > >>> 3 files changed, 286 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c > >>> index 7d2f4b0eb36a..eb9929ff9f60 100644 > >>> --- a/fs/fuse/file.c > >>> +++ b/fs/fuse/file.c > >>> @@ -105,10 +105,177 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args, > >>> kfree(ra); > >>> } > >>> > >>> +static bool fuse_file_is_direct_io(struct file *file) > >>> +{ > >>> + struct fuse_file *ff = file->private_data; > >>> + > >>> + return ff->open_flags & FOPEN_DIRECT_IO || file->f_flags & O_DIRECT; > >>> +} > >> > >> This is one of the issues with the per-file state. O_DIRECT can be > >> changed with fcntl(fd, F_SETFL, ...), so state calculated at open can > >> go stale. > >> > > > > Ouch! > > Actually, it may not matter much if we just ignore O_DIRECT > > completely for the purpose of io modes. > > I had also missed that fcntl option :/ > > > > > The io mode could be determined solely from FMODE_* flags. > > > > Worst case is that if the server opens an O_DIRECT file without > > FOPEN_DIRECT_IO, then this inode cannot do parallel dio > > and this inode cannot be opened in passthrough mode. > > I don't think that is such a big problem. > > > > Bernd, what do you think? > > Yeah, currently FOPEN_PARALLEL_DIRECT_WRITES does not have an effect > without FOPEN_DIRECT_IO. In my fuse-dio consolidation branch that > changes. For me it is ok that parallel dio requires > FOPEN_DIRECT_IO, easy enough to set on the server side in open methods > when O_DIRECT is set. We should just document it in fuse.h. > Ok. I removed the O_DIRECT code from the last version I pushed. Please verify that it does not break anything in your dio tests. > > > > BTW, server can and probably should open O_DIRECT files > > with FOPEN_PASSTHROUGH and that means something > > completely different than O_DIRECT && FOPEN_DIRECT_IO - > > it means that io is passed through to the backing file without > > doing buffered io on the backing file. > > > >>> + > >>> +/* > >>> + * Wait for cached io to be allowed - > >>> + * Blocks new parallel dio writes and waits for the in-progress parallel dio > >>> + * writes to complete. > >>> + */ > >>> +static int fuse_inode_wait_for_cached_io(struct fuse_inode *fi) > >>> +{ > >>> + int err = 0; > >>> + > >>> + assert_spin_locked(&fi->lock); > >>> + > >>> + while (!err && !fuse_inode_get_io_cache(fi)) { > >>> + /* > >>> + * Setting the bit advises new direct-io writes > >>> + * to use an exclusive lock - without it the wait below > >>> + * might be forever. > >>> + */ > >>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); > >>> + spin_unlock(&fi->lock); > >>> + err = wait_event_killable(fi->direct_io_waitq, > >>> + fuse_is_io_cache_allowed(fi)); > >>> + spin_lock(&fi->lock); > >>> + } > >>> + /* Clear FUSE_I_CACHE_IO_MODE flag if failed to enter caching mode */ > >>> + if (err && fi->iocachectr <= 0) > >>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state); > >>> + > >>> + return err; > >>> +} > >> > >> I suggest moving all infrastructure, including the inline helpers in > >> fuse_i.h into a separate source file. > >> > > > > OK. I can make those changes. > > moved to iomode.c > >>> + > >>> +/* Start cached io mode where parallel dio writes are not allowed */ > >>> +static int fuse_file_cached_io_start(struct inode *inode) > >>> +{ > >>> + struct fuse_inode *fi = get_fuse_inode(inode); > >>> + int err; > >>> + > >>> + spin_lock(&fi->lock); > >>> + err = fuse_inode_wait_for_cached_io(fi); > >>> + spin_unlock(&fi->lock); > >>> + return err; > >>> +} > >>> + > >>> +static void fuse_file_cached_io_end(struct inode *inode) > >>> +{ > >>> + struct fuse_inode *fi = get_fuse_inode(inode); > >>> + > >>> + spin_lock(&fi->lock); > >>> + fuse_inode_put_io_cache(get_fuse_inode(inode)); > >>> + spin_unlock(&fi->lock); > >>> +} > >>> + > >>> +/* Start strictly uncached io mode where cache access is not allowed */ > >>> +static int fuse_file_uncached_io_start(struct inode *inode) > >>> +{ > >>> + struct fuse_inode *fi = get_fuse_inode(inode); > >>> + bool ok; > >>> + > >>> + spin_lock(&fi->lock); > >>> + ok = fuse_inode_deny_io_cache(fi); > >>> + spin_unlock(&fi->lock); > >>> + return ok ? 0 : -ETXTBSY; > >>> +} > >>> + > >>> +static void fuse_file_uncached_io_end(struct inode *inode) > >>> +{ > >>> + struct fuse_inode *fi = get_fuse_inode(inode); > >>> + bool allow_cached_io; > >>> + > >>> + spin_lock(&fi->lock); > >>> + allow_cached_io = fuse_inode_allow_io_cache(fi); > >>> + spin_unlock(&fi->lock); > >>> + if (allow_cached_io) > >>> + wake_up(&fi->direct_io_waitq); > >>> +} > >>> + > >>> +/* Open flags to determine regular file io mode */ > >>> +#define FOPEN_IO_MODE_MASK \ > >>> + (FOPEN_DIRECT_IO | FOPEN_CACHE_IO) > >>> + > >>> +/* Request access to submit new io to inode via open file */ > >>> +static int fuse_file_io_open(struct file *file, struct inode *inode) > >>> +{ > >>> + struct fuse_file *ff = file->private_data; > >>> + int iomode_flags = ff->open_flags & FOPEN_IO_MODE_MASK; > >>> + int err; > >>> + > >>> + err = -EBUSY; > >>> + if (WARN_ON(ff->io_opened)) > >>> + goto fail; > >>> + > >>> + if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)) { > >> > >> This S_ISREG check can also go away with separating the directory opens. > > > > OK. I will see if I can make that cleanup as well. > done. Bernd, I pushed branch with latest cleanup patches and addressed Miklos' simpler review comments to: https://github.com/amir73il/linux/commits/fuse_io_mode-020224/ The one thing I did not change is the FOPEN_CACHE_IO flag and per-file state, because I have no idea how to fix this differently. I can change the external flag to an internal state if that is the main concern, but I had a feeling there was more to it. If you get the chance, please review and re-run your tests. Thanks, Amir.