On 12/19/23 14:01, Amir Goldstein wrote:
Here is what I was thinking about:
https://github.com/amir73il/linux/commits/fuse_io_mode
The concept that I wanted to introduce was the
fuse_inode_deny_io_cache()/fuse_inode_allow_io_cache()
helpers (akin to deny_write_access()/allow_write_access()).
In this patch, parallel dio in progress deny open in caching mode
and mmap, and I don't know if that is acceptable.
Technically, instead of deny open/mmap you can use additional
techniques to wait for in progress dio and allow caching open/mmap.
Anyway, I plan to use the iocachectr and fuse_inode_deny_io_cache()
pattern when file is open in FOPEN_PASSTHROUGH mode, but
in this case, as agreed with Miklos, a server trying to mix open
in caching mode on the same inode is going to fail the open.
mmap is less of a problem for inode in passthrough mode, because
mmap in of direct_io file and inode in passthrough mode is passthrough
mmap to backing file.
Anyway, if you can use this patch or parts of it, be my guest and if you
want to use a different approach that is fine by me as well - in that case
I will just remove the fuse_file_shared_dio_{start,end}() part from my patch.
Hi Amir,
here is my fuse-dio-v5 branch:
https://github.com/bsbernd/linux/commits/fuse-dio-v5/
(v5 is just compilation tested, tests are running now over night)
This looks very nice!
I left comments about some minor nits on github.
This branch is basically about consolidating fuse write direct IO code
paths and to allow a shared lock for O_DIRECT. I actually could have
noticed the page cache issue with shared locks before with previous
versions of these patches, just my VM kernel is optimized for
compilation time and some SHM options had been missing - with that fio
refused to run.
The branch includes a modified version of your patch:
https://github.com/bsbernd/linux/commit/6b05e52f7e253d9347d97de675b21b1707d6456e
Main changes are
- fuse_file_io_open() does not set the FOPEN_CACHE_IO flag for
file->f_flags & O_DIRECT
- fuse_file_io_mmap() waits on a dio waitq
- fuse_file_shared_dio_start / fuse_file_shared_dio_end are moved up in
the file, as I would like to entirely remove the fuse_direct_write iter
function (all goes through cache_write_iter)
Looks mostly good, but I think that fuse_file_shared_dio_start() =>
fuse_inode_deny_io_cache() should actually be done after taking
the inode lock (shared or exclusive) and not like in my patch.
First of all, this comment in fuse_dio_wr_exclusive_lock():
/*
* fuse_file_shared_dio_start() must not be called on retest,
* as it decreases a counter value - must not be done twice
*/
if (!fuse_file_shared_dio_start(inode))
return true;
...is suggesting that semantics are not clean and this check
must remain last, because if fuse_dio_wr_exclusive_lock()
returns false, iocachectr must not be elevated.
This is easy to get wrong in the future with current semantics.
The more important thing is that while fuse_file_io_mmap()
is waiting for iocachectr to drop to zero, new parallel dio can
come in and starve the mmap() caller forever.
I think that we are going to need to use some inode state flag
(e.g. FUSE_I_DIO_WR_EXCL) to protect against this starvation,
unless we do not care about this possibility?
We'd only need to set this in fuse_file_io_mmap() until we get
the iocachectr refcount.
I *think* that fuse_inode_deny_io_cache() should be called with
shared inode lock held, because of the existing lock chain
i_rwsem -> page lock -> mmap_lock for page faults, but I am
not sure. My brain is too cooked now to figure this out.
OTOH, I don't see any problem with calling
fuse_inode_deny_io_cache() with shared lock held?
I pushed this version to my fuse_io_mode branch [1].
Only tested generic/095 with FOPEN_DIRECT_IO and
DIRECT_IO_ALLOW_MMAP.
Thanks,
Amir.
[1] https://github.com/amir73il/linux/commits/fuse_io_mode
Thanks, will look into your changes next. I was looking into the initial
issue with generic/095 with my branch. Fixed by the attached patch. I
think it is generic and also applies to FOPEN_DIRECT_IO + mmap.
Interesting is that filemap_range_has_writeback() is exported, but there
was no user. Hopefully nobody submits an unexport patch in the mean time.
Thanks,
Bernd
commit bce66bf4b0b5d8cbeeb06ef3550ab4e02477f3e4
Author: Bernd Schubert <bschubert@xxxxxxx>
Date: Tue Dec 19 20:36:10 2023 +0100
dirty pages
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1fd3ba57accc8..26b13128b1e29 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -503,6 +503,19 @@ static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi,
return NULL;
}
+static bool fuse_inode_has_writeback(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ struct fuse_writepage_args *wpa;
+
+ spin_lock(&fi->lock);
+ wpa = rb_entry(fi->writepages.rb_node,
+ struct fuse_writepage_args, writepages_entry);
+ spin_unlock(&fi->lock);
+
+ return wpa != NULL;
+}
+
/*
* Check if any page in a range is under writeback
*
@@ -1449,8 +1462,18 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
/* fuse_file_shared_dio_start() must not be called on retest,
* as it decreases a counter value - must not be done twice
*/
- if (!fuse_file_shared_dio_start(inode))
+ if (!fuse_file_shared_dio_start(inode)) {
return true;
+ } else {
+ /* we succeeded to enable shared dio, but there still might be
+ * dirty pages
+ */
+ if (filemap_range_has_writeback(file->f_mapping, 0, LLONG_MAX) ||
+ fuse_inode_has_writeback(inode)) {
+ fuse_file_shared_dio_end(inode);
+ return true;
+ }
+ }
return false;
}
@@ -1472,6 +1495,7 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
inode_lock(inode);
} else {
inode_lock_shared(inode);
+
/*
* Previous check was without inode lock and might have raced,
* check again.