On Thu, 27 Jul 2023 at 10:13, Jaco Kroon <jaco@xxxxxxxxx> wrote: > > This patch does not mess with the caching infrastructure like the > previous one, which we believe caused excessive CPU and broke directory > listings in some cases. > > This version only affects the uncached read, which then during parse adds an > entry at a time to the cached structures by way of copying, and as such, > we believe this should be sufficient. > > We're still seeing cases where getdents64 takes ~10s (this was the case > in any case without this patch, the difference now that we get ~500 > entries for that time rather than the 14-18 previously). We believe > that that latency is introduced on glusterfs side and is under separate > discussion with the glusterfs developers. > > This is still a compile-time option, but a working one compared to > previous patch. For now this works, but it's not recommended for merge > (as per email discussion). > > This still uses alloc_pages rather than kvmalloc/kvfree. > > Signed-off-by: Jaco Kroon <jaco@xxxxxxxxx> > --- > fs/fuse/Kconfig | 16 ++++++++++++++++ > fs/fuse/readdir.c | 18 ++++++++++++------ > 2 files changed, 28 insertions(+), 6 deletions(-) > > diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig > index 038ed0b9aaa5..0783f9ee5cd3 100644 > --- a/fs/fuse/Kconfig > +++ b/fs/fuse/Kconfig > @@ -18,6 +18,22 @@ config FUSE_FS > If you want to develop a userspace FS, or if you want to use > a filesystem based on FUSE, answer Y or M. > > +config FUSE_READDIR_ORDER > + int > + range 0 5 > + default 5 > + help > + readdir performance varies greatly depending on the size of the read. > + Larger buffers results in larger reads, thus fewer reads and higher > + performance in return. > + > + You may want to reduce this value on seriously constrained memory > + systems where 128KiB (assuming 4KiB pages) cache pages is not ideal. > + > + This value reprents the order of the number of pages to allocate (ie, > + the shift value). A value of 0 is thus 1 page (4KiB) where 5 is 32 > + pages (128KiB). > + > config CUSE > tristate "Character device in Userspace support" > depends on FUSE_FS > diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c > index dc603479b30e..47cea4d91228 100644 > --- a/fs/fuse/readdir.c > +++ b/fs/fuse/readdir.c > @@ -13,6 +13,12 @@ > #include <linux/pagemap.h> > #include <linux/highmem.h> > > +#define READDIR_PAGES_ORDER CONFIG_FUSE_READDIR_ORDER > +#define READDIR_PAGES (1 << READDIR_PAGES_ORDER) > +#define READDIR_PAGES_SIZE (PAGE_SIZE << READDIR_PAGES_ORDER) > +#define READDIR_PAGES_MASK (READDIR_PAGES_SIZE - 1) > +#define READDIR_PAGES_SHIFT (PAGE_SHIFT + READDIR_PAGES_ORDER) > + > static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx) > { > struct fuse_conn *fc = get_fuse_conn(dir); > @@ -328,25 +334,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx) > struct fuse_mount *fm = get_fuse_mount(inode); > struct fuse_io_args ia = {}; > struct fuse_args_pages *ap = &ia.ap; > - struct fuse_page_desc desc = { .length = PAGE_SIZE }; > + struct fuse_page_desc desc = { .length = READDIR_PAGES_SIZE }; Does this really work? I would've thought we are relying on single page lengths somewhere. > u64 attr_version = 0; > bool locked; > > - page = alloc_page(GFP_KERNEL); > + page = alloc_pages(GFP_KERNEL, READDIR_PAGES_ORDER); > if (!page) > return -ENOMEM; > > plus = fuse_use_readdirplus(inode, ctx); > ap->args.out_pages = true; > - ap->num_pages = 1; > + ap->num_pages = READDIR_PAGES; No. This is the array lenght, which is 1. This is the hack I guess, which makes the above trick work. Better use kvmalloc, which might have a slightly worse performance than a large page, but definitely not worse than the current single page. If we want to optimize the overhead of kvmalloc (and it's a big if) then the parse_dir*file() functions would need to be converted to using a page array instead of a plain kernel pointer, which would add some complexity for sure. Thanks, Miklos