Re: [PATCH 0/2] fuse: Rename DIRECT_IO_{RELAX -> ALLOW_MMAP}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 19, 2023 at 10:47 PM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
>
>
> On 12/19/23 14:01, Amir Goldstein wrote:
> >>> Here is what I was thinking about:
> >>>
> >>> https://github.com/amir73il/linux/commits/fuse_io_mode
> >>>
> >>> The concept that I wanted to introduce was the
> >>> fuse_inode_deny_io_cache()/fuse_inode_allow_io_cache()
> >>> helpers (akin to deny_write_access()/allow_write_access()).
> >>>
> >>> In this patch, parallel dio in progress deny open in caching mode
> >>> and mmap, and I don't know if that is acceptable.
> >>> Technically, instead of deny open/mmap you can use additional
> >>> techniques to wait for in progress dio and allow caching open/mmap.
> >>>
> >>> Anyway, I plan to use the iocachectr and fuse_inode_deny_io_cache()
> >>> pattern when file is open in FOPEN_PASSTHROUGH mode, but
> >>> in this case, as agreed with Miklos, a server trying to mix open
> >>> in caching mode on the same inode is going to fail the open.
> >>>
> >>> mmap is less of a problem for inode in passthrough mode, because
> >>> mmap in of direct_io file and inode in passthrough mode is passthrough
> >>> mmap to backing file.
> >>>
> >>> Anyway, if you can use this patch or parts of it, be my guest and if you
> >>> want to use a different approach that is fine by me as well - in that case
> >>> I will just remove the fuse_file_shared_dio_{start,end}() part from my patch.
> >>
> >> Hi Amir,
> >>
> >> here is my fuse-dio-v5 branch:
> >> https://github.com/bsbernd/linux/commits/fuse-dio-v5/
> >>
> >> (v5 is just compilation tested, tests are running now over night)
> >
> > This looks very nice!
> > I left comments about some minor nits on github.
> >
> >>
> >> This branch is basically about consolidating fuse write direct IO code
> >> paths and to allow a shared lock for O_DIRECT. I actually could have
> >> noticed the page cache issue with shared locks before with previous
> >> versions of these patches, just my VM kernel is optimized for
> >> compilation time and some SHM options had been missing - with that fio
> >> refused to run.
> >>
> >> The branch includes a modified version of your patch:
> >> https://github.com/bsbernd/linux/commit/6b05e52f7e253d9347d97de675b21b1707d6456e
> >>
> >> Main changes are
> >> - fuse_file_io_open() does not set the FOPEN_CACHE_IO flag for
> >> file->f_flags & O_DIRECT
> >> - fuse_file_io_mmap() waits on a dio waitq
> >> - fuse_file_shared_dio_start / fuse_file_shared_dio_end are moved up in
> >> the file, as I would like to entirely remove the fuse_direct_write iter
> >> function (all goes through cache_write_iter)
> >>
> >
> > Looks mostly good, but I think that fuse_file_shared_dio_start() =>
> > fuse_inode_deny_io_cache() should actually be done after taking
> > the inode lock (shared or exclusive) and not like in my patch.
> >
> > First of all, this comment in fuse_dio_wr_exclusive_lock():
> >
> >          /*
> >           * fuse_file_shared_dio_start() must not be called on retest,
> >           * as it decreases a counter value - must not be done twice
> >           */
> >          if (!fuse_file_shared_dio_start(inode))
> >                  return true;
> >
> > ...is suggesting that semantics are not clean and this check
> > must remain last, because if fuse_dio_wr_exclusive_lock()
> > returns false, iocachectr must not be elevated.
> > This is easy to get wrong in the future with current semantics.
> >
> > The more important thing is that while fuse_file_io_mmap()
> > is waiting for iocachectr to drop to zero, new parallel dio can
> > come in and starve the mmap() caller forever.
> >
> > I think that we are going to need to use some inode state flag
> > (e.g. FUSE_I_DIO_WR_EXCL) to protect against this starvation,
> > unless we do not care about this possibility?
> > We'd only need to set this in fuse_file_io_mmap() until we get
> > the iocachectr refcount.
> >
> > I *think* that fuse_inode_deny_io_cache() should be called with
> > shared inode lock held, because of the existing lock chain
> > i_rwsem -> page lock -> mmap_lock for page faults, but I am
> > not sure. My brain is too cooked now to figure this out.
> > OTOH, I don't see any problem with calling
> > fuse_inode_deny_io_cache() with shared lock held?
> >
> > I pushed this version to my fuse_io_mode branch [1].
> > Only tested generic/095 with FOPEN_DIRECT_IO and
> > DIRECT_IO_ALLOW_MMAP.
> >
> > Thanks,
> > Amir.
> >
> > [1] https://github.com/amir73il/linux/commits/fuse_io_mode
>
> Thanks, will look into your changes next. I was looking into the initial
> issue with generic/095 with my branch. Fixed by the attached patch. I
> think it is generic and also applies to FOPEN_DIRECT_IO + mmap.
> Interesting is that filemap_range_has_writeback() is exported, but there
> was no user. Hopefully nobody submits an unexport patch in the mean time.
>

Ok. Now I am pretty sure that filemap_range_has_writeback() should be
check after taking the shared lock in fuse_dio_lock() as in my branch and
not in fuse_dio_wr_exclusive_lock() outside the lock.

But at the same time, it is a little concerning that you are able to observe
dirty pages on a fuse inode after success of fuse_inode_deny_io_cache().
The whole point of fuse_inode_deny_io_cache() is that it should be
granted after all users of the inode page cache are gone.

Is it expected that fuse inode pages remain dirty after no more open files
and no more mmaps?

Did we miss some case of access to page cache? unaligned dio perhaps?

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux