Re: [PATCH v2] fuse: Allow page aligned writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/5/24 22:32, Joanne Koong wrote:
> On Fri, Aug 2, 2024 at 2:52 PM Bernd Schubert <bschubert@xxxxxxx> wrote:
>>
>> Read/writes IOs should be page aligned as fuse server
> 
> I think you meant just "write IOs"  here and not reads?

Yeah sorry.

> 
>> might need to copy data to another buffer otherwise in
>> order to fulfill network or device storage requirements.
>>
>> Simple reproducer is with libfuse, example/passthrough*
>> and opening a file with O_DIRECT - without this change
>> writing to that file failed with -EINVAL if the underlying
>> file system was requiring alignment.
>>
>> Required server side changes:
>> Server needs to seek to the next page, without splice that is
>> just page size buffer alignment, with splice another splice
>> syscall is needed to seek over the unaligned area.
>>
>> Signed-off-by: Bernd Schubert <bschubert@xxxxxxx>
>>
>> ---
>>
>> Changes since v1:
>> - Fuse client does not send the alignment offset anymore in the write
>>   header.
>> - Use FOPEN_ALIGNED_WRITES to be filled in by FUSE_OPEN and FUSE_CREATE
>>   instead of a FUSE_INIT flag to allow control per file and to safe
>>   init flags.
>> - Instead of seeking a fixed offset, fuse_copy_align() just seeks to the
>>   next page.
>> - Added sanity checks in fuse_copy_align().
>>
>> libfuse patch:
>> https://github.com/libfuse/libfuse/pull/983
>>
>> From implmentation point of view it is debatable if request type
>> parsing should be done in fuse_copy_args() or if alignment
>> should be stored in struct fuse_arg / fuse_in_arg.
>> ---
>>  fs/fuse/dev.c             | 25 +++++++++++++++++++++++--
>>  fs/fuse/file.c            |  6 ++++++
>>  fs/fuse/fuse_i.h          |  6 ++++--
>>  include/uapi/linux/fuse.h |  9 ++++++++-
>>  4 files changed, 41 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 9eb191b5c4de..e0db408db90f 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -1009,6 +1009,24 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
>>         return 0;
>>  }
>>
>> +/* Align to the next page */
>> +static int fuse_copy_align(struct fuse_copy_state *cs)
>> +{
>> +       if (WARN_ON(!cs->write))
>> +               return -EIO;
> 
> I understand why you have the check here but in my opinion,
> fuse_copy_align should just be a generic api since the rest of the arg
> copying APIs are generic and with having "align" as a bit in the args
> field, align capability seems generic as well.

I can remove it no issue. I had in there as wrong alignment action will
cause data corruption.

> 
>> +
>> +       if (WARN_ON(cs->move_pages))
>> +               return -EIO;
>> +
>> +       if (WARN_ON(cs->len == PAGE_SIZE || cs->offset == 0))
>> +               return -EIO;
>> +
>> +       /* Seek to the next page */
>> +       cs->offset += cs->len;
>> +       cs->len = 0;
>> +       return 0;
>> +}
>> +
>>  /* Copy request arguments to/from userspace buffer */
>>  static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>>                           unsigned argpages, struct fuse_arg *args,
>> @@ -1019,10 +1037,13 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>>
>>         for (i = 0; !err && i < numargs; i++)  {
>>                 struct fuse_arg *arg = &args[i];
>> -               if (i == numargs - 1 && argpages)
>> +               if (i == numargs - 1 && argpages) {
>>                         err = fuse_copy_pages(cs, arg->size, zeroing);
>> -               else
>> +               } else {
>>                         err = fuse_copy_one(cs, arg->value, arg->size);
>> +                       if (!err && arg->align)
>> +                               err = fuse_copy_align(cs);
>> +               }
>>         }
>>         return err;
>>  }
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index f39456c65ed7..931e7324137f 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -1062,6 +1062,12 @@ static void fuse_write_args_fill(struct fuse_io_args *ia, struct fuse_file *ff,
>>                 args->in_args[0].size = FUSE_COMPAT_WRITE_IN_SIZE;
>>         else
>>                 args->in_args[0].size = sizeof(ia->write.in);
>> +
>> +       if (ff->open_flags & FOPEN_ALIGNED_WRITES) {
>> +               args->in_args[0].align = 1;
> 
> This is more of a nit so feel free to disregard, but I think the code
> is easier to understand if the align bit gets set on the arg that
> needs alignment instead of on the preceding arg. I think this would
> also make fuse_copy_args() easier to grok, since the alignment would
> be done before doing fuse_copy_pages(), which makes it more obvious
> that the alignment is for the copied out pages

Makes sense, going to change it to args->in_args[1].

> 
>> +               ia->write.in.write_flags |= FUSE_WRITE_ALIGNED;
>> +       }
>> +
>>         args->in_args[0].value = &ia->write.in;
>>         args->in_args[1].size = count;
>>         args->out_numargs = 1;
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index f23919610313..1600bd7b1d0d 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -275,13 +275,15 @@ struct fuse_file {
>>
>>  /** One input argument of a request */
>>  struct fuse_in_arg {
>> -       unsigned size;
>> +       unsigned int size;
>> +       unsigned int align:1;
>>         const void *value;
>>  };
>>
>>  /** One output argument of a request */
>>  struct fuse_arg {
>> -       unsigned size;
>> +       unsigned int size;
>> +       unsigned int align:1;
>>         void *value;
>>  };
>>
>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
>> index d08b99d60f6f..742262c7c1eb 100644
>> --- a/include/uapi/linux/fuse.h
>> +++ b/include/uapi/linux/fuse.h
>> @@ -217,6 +217,9 @@
>>   *  - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag
>>   *  - add FUSE_NO_EXPORT_SUPPORT init flag
>>   *  - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag
>> + *
>> + * 7.41
>> + *  - add FOPEN_ALIGNED_WRITES open flag and FUSE_WRITE_ALIGNED write flag
>>   */
>>
>>  #ifndef _LINUX_FUSE_H
>> @@ -252,7 +255,7 @@
>>  #define FUSE_KERNEL_VERSION 7
>>
>>  /** Minor version number of this interface */
>> -#define FUSE_KERNEL_MINOR_VERSION 40
>> +#define FUSE_KERNEL_MINOR_VERSION 41
>>
>>  /** The node ID of the root inode */
>>  #define FUSE_ROOT_ID 1
>> @@ -360,6 +363,7 @@ struct fuse_file_lock {
>>   * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
>>   * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
>>   * FOPEN_PASSTHROUGH: passthrough read/write io for this open file
>> + * FOPEN_ALIGNED_WRITES: Page align write io data
>>   */
>>  #define FOPEN_DIRECT_IO                (1 << 0)
>>  #define FOPEN_KEEP_CACHE       (1 << 1)
>> @@ -369,6 +373,7 @@ struct fuse_file_lock {
>>  #define FOPEN_NOFLUSH          (1 << 5)
>>  #define FOPEN_PARALLEL_DIRECT_WRITES   (1 << 6)
>>  #define FOPEN_PASSTHROUGH      (1 << 7)
>> +#define FOPEN_ALIGNED_WRITES   (1 << 8)
> 
> Nice, I like how you made the flag on open instead of init to allow
> this option to be file-specific.
> 
>>
>>  /**
>>   * INIT request/reply flags
>> @@ -496,10 +501,12 @@ struct fuse_file_lock {
>>   * FUSE_WRITE_CACHE: delayed write from page cache, file handle is guessed
>>   * FUSE_WRITE_LOCKOWNER: lock_owner field is valid
>>   * FUSE_WRITE_KILL_SUIDGID: kill suid and sgid bits
>> + * FUSE_WRITE_ALIGNED: Write io data are page aligned
>>   */
>>  #define FUSE_WRITE_CACHE       (1 << 0)
>>  #define FUSE_WRITE_LOCKOWNER   (1 << 1)
>>  #define FUSE_WRITE_KILL_SUIDGID (1 << 2)
>> +#define FUSE_WRITE_ALIGNED      (1 << 3)
>>
>>  /* Obsolete alias; this flag implies killing suid/sgid only. */
>>  #define FUSE_WRITE_KILL_PRIV   FUSE_WRITE_KILL_SUIDGID
>> --
>> 2.43.0
>>
> Regarding where/how alignment should be stored, eg request type
> parsing vs in fuse_arg/fuse_in_arg structs -
> 
> I don't feel strongly about this but it feels cleaner to me to do
> request type parsing given that alignment is only needed for fuse
> write requests. In my mind, it would look something like this:
> 
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1012,17 +1012,21 @@ static int fuse_copy_one(struct
> fuse_copy_state *cs, void *val, unsigned size)
>  /* Copy request arguments to/from userspace buffer */
>  static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>                           unsigned argpages, struct fuse_arg *args,
> -                         int zeroing)
> +                         int zeroing, bool align)
>  {
>         int err = 0;
>         unsigned i;
> 
>         for (i = 0; !err && i < numargs; i++)  {
>                 struct fuse_arg *arg = &args[i];
> -               if (i == numargs - 1 && argpages)
> -                       err = fuse_copy_pages(cs, arg->size, zeroing);
> -               else
> +               if (i == numargs - 1 && argpages) {
> +                       if (align)
> +                               err = fuse_copy_align(cs);
> +                       if (!err)
> +                               err = fuse_copy_pages(cs, arg->size, zeroing);
> +               } else {
>                         err = fuse_copy_one(cs, arg->value, arg->size);
> +               }
>         }
>         return err;
>  }
> 
> +static bool should_align_copy_pages(struct file *file, struct fuse_args *args)
> +{
> +       struct fuse_file *ff = file->private_data;
> +
> +       return (ff->open_flags & FOPEN_ALIGNED_WRITES) && args->opcode
> == FUSE_WRITE;
> +}
> +
> @@ -1212,6 +1223,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev
> *fud, struct file *file,
>         struct fuse_args *args;
>         unsigned reqsize;
>         unsigned int hash;
> +       bool align;
> 
>         /*
>          * Require sane minimum read buffer - that has capacity for fixed part
> @@ -1296,9 +1308,10 @@ static ssize_t fuse_dev_do_read(struct fuse_dev
> *fud, struct file *file,
>         spin_unlock(&fpq->lock);
>         cs->req = req;
>         err = fuse_copy_one(cs, &req->in.h, sizeof(req->in.h));
> +       align = should_align_copy_pages(file, args);
>         if (!err)
>                 err = fuse_copy_args(cs, args->in_numargs, args->in_pages,
> -                                    (struct fuse_arg *) args->in_args, 0);
> +                                    (struct fuse_arg *)
> args->in_args, 0, align);
>         fuse_copy_finish(cs);
>         spin_lock(&fpq->lock);
>         clear_bit(FR_LOCKED, &req->flags);
> @@ -1896,7 +1909,7 @@ static int copy_out_args(struct fuse_copy_state
> *cs, struct fuse_args *args,
>                 lastarg->size -= diffsize;
>         }
>         return fuse_copy_args(cs, args->out_numargs, args->out_pages,
> -                             args->out_args, args->page_zeroing);
> +                             args->out_args, args->page_zeroing, false);
>  }
> 
> which also seems to me easier to follow in logic than having the align
> bit in the args. But if align will be something that other operations
> will also need/request, then I think it makes sense to have it in the
> args.

Personally I prefer to have it as args parameter. I had a version with a
FUSE_WRITE check in fuse_copy_args() - from my point of view, if at all
it should go there. Maybe it is only me, but I would prefer to avoid
adding conditions in fuse_dev_do_read() (if it can be avoided) as that
function is already a bit complex.

My guess is also that we will need to align FUSE_SETXATTR, although I
don't have an immediate use case for it.


Thanks for your review!


Bernd






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux