On Thu, Sep 5, 2024 at 2:16 PM Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > > Hi Joanne, > > On 9/5/24 19:45, Joanne Koong wrote: > > Introduce the capability to dynamically configure the fuse max pages > > limit (formerly #defined as FUSE_MAX_MAX_PAGES) through a sysctl. > > This enhancement allows system administrators to adjust the value > > based on system-specific requirements. > > > > This removes the previous static limit of 256 max pages, which limits > > the max write size of a request to 1 MiB (on 4096 pagesize systems). > > Having the ability to up the max write size beyond 1 MiB allows for the > > perf improvements detailed in this thread [1]. > > the change itself looks good to me, but have you seen this discussion here? > > https://lore.kernel.org/lkml/CAJfpegs10SdtzNXJfj3=vxoAZMhksT5A1u5W5L6nKL-P2UOuLQ@xxxxxxxxxxxxxx/T/ > > > Miklos is basically worried about page pinning and accounting for that > for unprivileged user processes. > Thanks for the link to the previous discussion, Bernd. I'll cc Xu and Jingbo, who started that thread, to this email. I'm curious whether this sysctl approach might mitigate those worries here since modifying the sysctl value will require admin privileges to explicitly opt into this. I liked Sweet Tea's comment "Perhaps, in analogy to soft and hard limits on pipe size, FUSE_MAX_MAX_PAGES could be increased and treated as the maximum possible hard limit for max_write; and the default hard limit could stay at 1M, thereby allowing folks to opt into the new behavior if they care about the performance more than the memory?" where something like this could let admins choose what aspects they'd like to optimize for. My understanding of how bigger write buffers affects pinning is that each chunk of the write will pin a higher number of contiguous pages at one time (but overall for the duration of the write request, the number for total pages that get pinned are the same). My understanding is that the pages only get pinned when we do the copying to the fuse server's buffer (and is not pinned while the server is servicing the request). Thanks, Joanne > > Thanks, > Bernd > > > > > > $ sysctl -a | grep max_pages_limit > > fs.fuse.max_pages_limit = 256 > > > > $ sysctl -n fs.fuse.max_pages_limit > > 256 > > > > $ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit > > 1024 > > > > $ sysctl -n fs.fuse.max_pages_limit > > 1024 > > > > $ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit > > tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument > > > > $ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit > > tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument > > > > $ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit > > 65535 > > > > $ sysctl -n fs.fuse.max_pages_limit > > 65535 > > > > v2 (original): > > https://lore.kernel.org/linux-fsdevel/20240702014627.4068146-1-joannelkoong@xxxxxxxxx/ > > > > v1: > > https://lore.kernel.org/linux-fsdevel/20240628001355.243805-1-joannelkoong@xxxxxxxxx/ > > > > Changes from v1: > > - Rename fuse_max_max_pages to fuse_max_pages_limit internally > > - Rename /proc/sys/fs/fuse/fuse_max_max_pages to > > /proc/sys/fs/fuse/max_pages_limit > > - Restrict fuse max_pages_limit sysctl values to between 1 and 65535 > > (inclusive) > > > > [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@xxxxxxxxxxxxxxxxx/T/#u > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > > --- > > Documentation/admin-guide/sysctl/fs.rst | 10 +++++++ > > fs/fuse/Makefile | 2 +- > > fs/fuse/fuse_i.h | 14 +++++++-- > > fs/fuse/inode.c | 11 ++++++- > > fs/fuse/ioctl.c | 4 ++- > > fs/fuse/sysctl.c | 40 +++++++++++++++++++++++++ > > 6 files changed, 75 insertions(+), 6 deletions(-) > > create mode 100644 fs/fuse/sysctl.c > > > > diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst > > index 47499a1742bd..fa25d7e718b3 100644 > > --- a/Documentation/admin-guide/sysctl/fs.rst > > +++ b/Documentation/admin-guide/sysctl/fs.rst > > @@ -332,3 +332,13 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes > > on a 64-bit one. > > The current default value for ``max_user_watches`` is 4% of the > > available low memory, divided by the "watch" cost in bytes. > > + > > +5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems > > +===================================================================== > > + > > +This directory contains the following configuration options for FUSE > > +filesystems: > > + > > +``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for > > +setting/getting the maximum number of pages that can be used for servicing > > +requests in FUSE. > > diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile > > index 6e0228c6d0cb..cd4ef3e08ebf 100644 > > --- a/fs/fuse/Makefile > > +++ b/fs/fuse/Makefile > > @@ -7,7 +7,7 @@ obj-$(CONFIG_FUSE_FS) += fuse.o > > obj-$(CONFIG_CUSE) += cuse.o > > obj-$(CONFIG_VIRTIO_FS) += virtiofs.o > > > > -fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o > > +fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o sysctl.o > > fuse-y += iomode.o > > fuse-$(CONFIG_FUSE_DAX) += dax.o > > fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > > index f23919610313..bb252a3ea37b 100644 > > --- a/fs/fuse/fuse_i.h > > +++ b/fs/fuse/fuse_i.h > > @@ -35,9 +35,6 @@ > > /** Default max number of pages that can be used in a single read request */ > > #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32 > > > > -/** Maximum of max_pages received in init_out */ > > -#define FUSE_MAX_MAX_PAGES 256 > > - > > /** Bias for fi->writectr, meaning new writepages must not be sent */ > > #define FUSE_NOWRITE INT_MIN > > > > @@ -47,6 +44,9 @@ > > /** Number of dentries for each connection in the control filesystem */ > > #define FUSE_CTL_NUM_DENTRIES 5 > > > > +/** Maximum of max_pages received in init_out */ > > +extern unsigned int fuse_max_pages_limit; > > + > > /** List of active connections */ > > extern struct list_head fuse_conn_list; > > > > @@ -1472,4 +1472,12 @@ ssize_t fuse_passthrough_splice_write(struct pipe_inode_info *pipe, > > size_t len, unsigned int flags); > > ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma); > > > > +#ifdef CONFIG_SYSCTL > > +extern int fuse_sysctl_register(void); > > +extern void fuse_sysctl_unregister(void); > > +#else > > +#define fuse_sysctl_register() (0) > > +#define fuse_sysctl_unregister() do { } while (0) > > +#endif /* CONFIG_SYSCTL */ > > + > > #endif /* _FS_FUSE_I_H */ > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > > index 99e44ea7d875..973e58df816a 100644 > > --- a/fs/fuse/inode.c > > +++ b/fs/fuse/inode.c > > @@ -35,6 +35,8 @@ DEFINE_MUTEX(fuse_mutex); > > > > static int set_global_limit(const char *val, const struct kernel_param *kp); > > > > +unsigned int fuse_max_pages_limit = 256; > > + > > unsigned max_user_bgreq; > > module_param_call(max_user_bgreq, set_global_limit, param_get_uint, > > &max_user_bgreq, 0644); > > @@ -932,7 +934,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, > > fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); > > fc->user_ns = get_user_ns(user_ns); > > fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; > > - fc->max_pages_limit = FUSE_MAX_MAX_PAGES; > > + fc->max_pages_limit = fuse_max_pages_limit; > > > > if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) > > fuse_backing_files_init(fc); > > @@ -2039,8 +2041,14 @@ static int __init fuse_fs_init(void) > > if (err) > > goto out3; > > > > + err = fuse_sysctl_register(); > > + if (err) > > + goto out4; > > + > > return 0; > > > > + out4: > > + unregister_filesystem(&fuse_fs_type); > > out3: > > unregister_fuseblk(); > > out2: > > @@ -2053,6 +2061,7 @@ static void fuse_fs_cleanup(void) > > { > > unregister_filesystem(&fuse_fs_type); > > unregister_fuseblk(); > > + fuse_sysctl_unregister(); > > > > /* > > * Make sure all delayed rcu free inodes are flushed before we > > diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c > > index 572ce8a82ceb..a6c8ee551635 100644 > > --- a/fs/fuse/ioctl.c > > +++ b/fs/fuse/ioctl.c > > @@ -10,6 +10,8 @@ > > #include <linux/fileattr.h> > > #include <linux/fsverity.h> > > > > +#define FUSE_VERITY_ENABLE_ARG_MAX_PAGES 256 > > + > > static ssize_t fuse_send_ioctl(struct fuse_mount *fm, struct fuse_args *args, > > struct fuse_ioctl_out *outarg) > > { > > @@ -140,7 +142,7 @@ static int fuse_setup_enable_verity(unsigned long arg, struct iovec *iov, > > { > > struct fsverity_enable_arg enable; > > struct fsverity_enable_arg __user *uarg = (void __user *)arg; > > - const __u32 max_buffer_len = FUSE_MAX_MAX_PAGES * PAGE_SIZE; > > + const __u32 max_buffer_len = FUSE_VERITY_ENABLE_ARG_MAX_PAGES * PAGE_SIZE; > > > > if (copy_from_user(&enable, uarg, sizeof(enable))) > > return -EFAULT; > > diff --git a/fs/fuse/sysctl.c b/fs/fuse/sysctl.c > > new file mode 100644 > > index 000000000000..b272bb333005 > > --- /dev/null > > +++ b/fs/fuse/sysctl.c > > @@ -0,0 +1,40 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* > > + * linux/fs/fuse/fuse_sysctl.c > > + * > > + * Sysctl interface to fuse parameters > > + */ > > +#include <linux/sysctl.h> > > + > > +#include "fuse_i.h" > > + > > +static struct ctl_table_header *fuse_table_header; > > + > > +/* Bound by fuse_init_out max_pages, which is a u16 */ > > +static unsigned int sysctl_fuse_max_pages_limit = 65535; > > + > > +static struct ctl_table fuse_sysctl_table[] = { > > + { > > + .procname = "max_pages_limit", > > + .data = &fuse_max_pages_limit, > > + .maxlen = sizeof(fuse_max_pages_limit), > > + .mode = 0644, > > + .proc_handler = proc_douintvec_minmax, > > + .extra1 = SYSCTL_ONE, > > + .extra2 = &sysctl_fuse_max_pages_limit, > > + }, > > +}; > > + > > +int fuse_sysctl_register(void) > > +{ > > + fuse_table_header = register_sysctl("fs/fuse", fuse_sysctl_table); > > + if (!fuse_table_header) > > + return -ENOMEM; > > + return 0; > > +} > > + > > +void fuse_sysctl_unregister(void) > > +{ > > + unregister_sysctl_table(fuse_table_header); > > + fuse_table_header = NULL; > > +}