Re: [PATCH 0/2] fuse: Rename DIRECT_IO_{RELAX -> ALLOW_MMAP}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/20/23 05:18, Amir Goldstein wrote:
On Tue, Dec 19, 2023 at 10:47 PM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:



On 12/19/23 14:01, Amir Goldstein wrote:
Here is what I was thinking about:

https://github.com/amir73il/linux/commits/fuse_io_mode

The concept that I wanted to introduce was the
fuse_inode_deny_io_cache()/fuse_inode_allow_io_cache()
helpers (akin to deny_write_access()/allow_write_access()).

In this patch, parallel dio in progress deny open in caching mode
and mmap, and I don't know if that is acceptable.
Technically, instead of deny open/mmap you can use additional
techniques to wait for in progress dio and allow caching open/mmap.

Anyway, I plan to use the iocachectr and fuse_inode_deny_io_cache()
pattern when file is open in FOPEN_PASSTHROUGH mode, but
in this case, as agreed with Miklos, a server trying to mix open
in caching mode on the same inode is going to fail the open.

mmap is less of a problem for inode in passthrough mode, because
mmap in of direct_io file and inode in passthrough mode is passthrough
mmap to backing file.

Anyway, if you can use this patch or parts of it, be my guest and if you
want to use a different approach that is fine by me as well - in that case
I will just remove the fuse_file_shared_dio_{start,end}() part from my patch.

Hi Amir,

here is my fuse-dio-v5 branch:
https://github.com/bsbernd/linux/commits/fuse-dio-v5/

(v5 is just compilation tested, tests are running now over night)

This looks very nice!
I left comments about some minor nits on github.


This branch is basically about consolidating fuse write direct IO code
paths and to allow a shared lock for O_DIRECT. I actually could have
noticed the page cache issue with shared locks before with previous
versions of these patches, just my VM kernel is optimized for
compilation time and some SHM options had been missing - with that fio
refused to run.

The branch includes a modified version of your patch:
https://github.com/bsbernd/linux/commit/6b05e52f7e253d9347d97de675b21b1707d6456e

Main changes are
- fuse_file_io_open() does not set the FOPEN_CACHE_IO flag for
file->f_flags & O_DIRECT
- fuse_file_io_mmap() waits on a dio waitq
- fuse_file_shared_dio_start / fuse_file_shared_dio_end are moved up in
the file, as I would like to entirely remove the fuse_direct_write iter
function (all goes through cache_write_iter)


Looks mostly good, but I think that fuse_file_shared_dio_start() =>
fuse_inode_deny_io_cache() should actually be done after taking
the inode lock (shared or exclusive) and not like in my patch.

First of all, this comment in fuse_dio_wr_exclusive_lock():

          /*
           * fuse_file_shared_dio_start() must not be called on retest,
           * as it decreases a counter value - must not be done twice
           */
          if (!fuse_file_shared_dio_start(inode))
                  return true;

...is suggesting that semantics are not clean and this check
must remain last, because if fuse_dio_wr_exclusive_lock()
returns false, iocachectr must not be elevated.
This is easy to get wrong in the future with current semantics.

The more important thing is that while fuse_file_io_mmap()
is waiting for iocachectr to drop to zero, new parallel dio can
come in and starve the mmap() caller forever.

I think that we are going to need to use some inode state flag
(e.g. FUSE_I_DIO_WR_EXCL) to protect against this starvation,
unless we do not care about this possibility?
We'd only need to set this in fuse_file_io_mmap() until we get
the iocachectr refcount.

I *think* that fuse_inode_deny_io_cache() should be called with
shared inode lock held, because of the existing lock chain
i_rwsem -> page lock -> mmap_lock for page faults, but I am
not sure. My brain is too cooked now to figure this out.
OTOH, I don't see any problem with calling
fuse_inode_deny_io_cache() with shared lock held?

I pushed this version to my fuse_io_mode branch [1].
Only tested generic/095 with FOPEN_DIRECT_IO and
DIRECT_IO_ALLOW_MMAP.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fuse_io_mode

Thanks, will look into your changes next. I was looking into the initial
issue with generic/095 with my branch. Fixed by the attached patch. I
think it is generic and also applies to FOPEN_DIRECT_IO + mmap.
Interesting is that filemap_range_has_writeback() is exported, but there
was no user. Hopefully nobody submits an unexport patch in the mean time.


Ok. Now I am pretty sure that filemap_range_has_writeback() should be
check after taking the shared lock in fuse_dio_lock() as in my branch and
not in fuse_dio_wr_exclusive_lock() outside the lock.




But at the same time, it is a little concerning that you are able to observe
dirty pages on a fuse inode after success of fuse_inode_deny_io_cache().
The whole point of fuse_inode_deny_io_cache() is that it should be
granted after all users of the inode page cache are gone.

Is it expected that fuse inode pages remain dirty after no more open files
and no more mmaps?


I'm actually not sure anymore if filemap_range_has_writeback() is actually needed. In fuse_flush() it calls write_inode_now(inode, 1), but I don't think that will flush queued fi->writectr (fi->writepages). Will report back in the afternoon.



Did we miss some case of access to page cache? unaligned dio perhaps?

Yeah, there is indeed another problem,
fuse_cache_write_iter() can fall back to page writes, but I had especially added a warning message in temporary patch - I didn't manage to trigger that path yet. Will add a fix for that.

Thanks,
Bernd






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux