Re: [PATCH] fuse: enable larger read buffers for readdir [v2].

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 27 Jul 2023 at 10:13, Jaco Kroon <jaco@xxxxxxxxx> wrote:
>
> This patch does not mess with the caching infrastructure like the
> previous one, which we believe caused excessive CPU and broke directory
> listings in some cases.
>
> This version only affects the uncached read, which then during parse adds an
> entry at a time to the cached structures by way of copying, and as such,
> we believe this should be sufficient.
>
> We're still seeing cases where getdents64 takes ~10s (this was the case
> in any case without this patch, the difference now that we get ~500
> entries for that time rather than the 14-18 previously).  We believe
> that that latency is introduced on glusterfs side and is under separate
> discussion with the glusterfs developers.
>
> This is still a compile-time option, but a working one compared to
> previous patch.  For now this works, but it's not recommended for merge
> (as per email discussion).
>
> This still uses alloc_pages rather than kvmalloc/kvfree.
>
> Signed-off-by: Jaco Kroon <jaco@xxxxxxxxx>
> ---
>  fs/fuse/Kconfig   | 16 ++++++++++++++++
>  fs/fuse/readdir.c | 18 ++++++++++++------
>  2 files changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index 038ed0b9aaa5..0783f9ee5cd3 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -18,6 +18,22 @@ config FUSE_FS
>           If you want to develop a userspace FS, or if you want to use
>           a filesystem based on FUSE, answer Y or M.
>
> +config FUSE_READDIR_ORDER
> +       int
> +       range 0 5
> +       default 5
> +       help
> +               readdir performance varies greatly depending on the size of the read.
> +               Larger buffers results in larger reads, thus fewer reads and higher
> +               performance in return.
> +
> +               You may want to reduce this value on seriously constrained memory
> +               systems where 128KiB (assuming 4KiB pages) cache pages is not ideal.
> +
> +               This value reprents the order of the number of pages to allocate (ie,
> +               the shift value).  A value of 0 is thus 1 page (4KiB) where 5 is 32
> +               pages (128KiB).
> +
>  config CUSE
>         tristate "Character device in Userspace support"
>         depends on FUSE_FS
> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
> index dc603479b30e..47cea4d91228 100644
> --- a/fs/fuse/readdir.c
> +++ b/fs/fuse/readdir.c
> @@ -13,6 +13,12 @@
>  #include <linux/pagemap.h>
>  #include <linux/highmem.h>
>
> +#define READDIR_PAGES_ORDER            CONFIG_FUSE_READDIR_ORDER
> +#define READDIR_PAGES                  (1 << READDIR_PAGES_ORDER)
> +#define READDIR_PAGES_SIZE             (PAGE_SIZE << READDIR_PAGES_ORDER)
> +#define READDIR_PAGES_MASK             (READDIR_PAGES_SIZE - 1)
> +#define READDIR_PAGES_SHIFT            (PAGE_SHIFT + READDIR_PAGES_ORDER)
> +
>  static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
>  {
>         struct fuse_conn *fc = get_fuse_conn(dir);
> @@ -328,25 +334,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
>         struct fuse_mount *fm = get_fuse_mount(inode);
>         struct fuse_io_args ia = {};
>         struct fuse_args_pages *ap = &ia.ap;
> -       struct fuse_page_desc desc = { .length = PAGE_SIZE };
> +       struct fuse_page_desc desc = { .length = READDIR_PAGES_SIZE };

Does this really work?  I would've thought we are relying on single
page lengths somewhere.

>         u64 attr_version = 0;
>         bool locked;
>
> -       page = alloc_page(GFP_KERNEL);
> +       page = alloc_pages(GFP_KERNEL, READDIR_PAGES_ORDER);
>         if (!page)
>                 return -ENOMEM;
>
>         plus = fuse_use_readdirplus(inode, ctx);
>         ap->args.out_pages = true;
> -       ap->num_pages = 1;
> +       ap->num_pages = READDIR_PAGES;

No.  This is the array lenght, which is 1.  This is the hack I guess,
which makes the above trick work.

Better use kvmalloc, which might have a slightly worse performance
than a large page, but definitely not worse than the current single
page.

If we want to optimize the overhead of kvmalloc (and it's a big if)
then the parse_dir*file() functions would need to be converted to
using a page array instead of a plain kernel pointer, which would add
some complexity for sure.

Thanks,
Miklos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux