On Fri, 22 Apr 2022 at 16:30, Dharmendra Hans <dharamhans87@xxxxxxxxx> wrote: > > On Thu, Apr 21, 2022 at 8:52 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > On Fri, 8 Apr 2022 at 08:18, Dharmendra Singh <dharamhans87@xxxxxxxxx> wrote: > > > > > > As of now, in Fuse, direct writes on the same file are serialized > > > over inode lock i.e we hold inode lock for the whole duration of > > > the write request. This serialization works pretty well for the FUSE > > > user space implementations which rely on this inode lock for their > > > cache/data integrity etc. But it hurts badly such FUSE implementations > > > which has their own ways of mainting data/cache integrity and does not > > > use this serialization at all. > > > > > > This patch allows parallel direct writes on the same file with the > > > help of a flag called FOPEN_PARALLEL_WRITES. If this flag is set on > > > the file (flag is passed from libfuse to fuse kernel as part of file > > > open/create), we do not hold inode lock for the whole duration of the > > > request, instead acquire it only to protect updates on certain fields > > > of the inode. FUSE implementations which rely on this inode lock can > > > continue to do so and this is default behaviour. > > > > > > Signed-off-by: Dharmendra Singh <dsingh@xxxxxxx> > > > --- > > > fs/fuse/file.c | 38 ++++++++++++++++++++++++++++++++++---- > > > include/uapi/linux/fuse.h | 2 ++ > > > 2 files changed, 36 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > > > index 37eebfb90500..d3e8f44c1228 100644 > > > --- a/fs/fuse/file.c > > > +++ b/fs/fuse/file.c > > > @@ -1465,6 +1465,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, > > > int err = 0; > > > struct fuse_io_args *ia; > > > unsigned int max_pages; > > > + bool p_write = write && > > > + (ff->open_flags & FOPEN_PARALLEL_WRITES) ? true : false; > > > > > > max_pages = iov_iter_npages(iter, fc->max_pages); > > > ia = fuse_io_alloc(io, max_pages); > > > @@ -1472,10 +1474,11 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, > > > return -ENOMEM; > > > > > > if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) { > > > - if (!write) > > > + /* Parallel write does not come with inode lock held */ > > > + if (!write || p_write) > > > > Probably would be good to add an inode_is_locked() assert in > > fuse_sync_writes() to make sure we don't miss cases silently. > > I think fuse_set_nowrite() called from fuse_sync_writes() already has > this assertion. Ah, okay. > > > > > > inode_lock(inode); > > > fuse_sync_writes(inode); > > > - if (!write) > > > + if (!write || p_write) > > > inode_unlock(inode); > > > } > > > > > > @@ -1568,22 +1571,36 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to) > > > static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) > > > { > > > struct inode *inode = file_inode(iocb->ki_filp); > > > + struct file *file = iocb->ki_filp; > > > + struct fuse_file *ff = file->private_data; > > > struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb); > > > ssize_t res; > > > + bool p_write = ff->open_flags & FOPEN_PARALLEL_WRITES ? true : false; > > > + bool unlock_inode = true; > > > > > > /* Don't allow parallel writes to the same file */ > > > inode_lock(inode); > > > res = generic_write_checks(iocb, from); > > > > I don't think this needs inode lock. At least nfs_file_direct_write() > > doesn't have it. > > > > What it does have, however is taking the inode lock for shared for the > > actual write operation, which is probably something that fuse needs as > > well. > > > > Also I worry about size extending writes not holding the inode lock > > exclusive. Would that be a problem in your use case? > > Thanks for pointing out this issue. Actually there is an issue in > appending writes. > Until unless current appeding write is finished and does not update > i_size, next appending > write can't be allowed as it would be otherwise one request > overwriting data written > by another request. > For other kind of writes, I do not see the issue as i_size update can > be handled as it is > done currently as these writes are based upon fixed offset instead of > generating offset > from i_size. That's true, but I still worry... Does your workload include non-append extending writes? Seems to me making those run in parallel is asking for trouble. > If we agreed, I would be sending the updated patch shortly. > (Also please take a look on other patches raised by me for atomic-open, these > patches are pending since couple of weeks) I'm looking at that currently. Thanks, Miklos