On Sat, Dec 19, 2020 at 1:48 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Sat, Dec 19, 2020 at 12:11 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > > > Currently the fs sysctl inotify/max_user_instances is used to limit the > > number of inotify instances on the system. For systems running multiple > > workloads, the per-user namespace sysctl max_inotify_instances can be > > used to further partition inotify instances. However there is no easy > > way to set a sensible system level max limit on inotify instances and > > further partition it between the workloads. It is much easier to charge > > the underlying resource (i.e. memory) behind the inotify instances to > > the memcg of the workload and let their memory limits limit the number > > of inotify instances they can create. > > Not that I have a problem with this patch, but what problem does it try to > solve? I am aiming for the simplicity to not set another limit which can indirectly be limited by memcg limits. I just want to set the memcg limit on our production environment which runs multiple workloads on a system and not think about setting a sensible value to max_user_instances in production. I would prefer to set max_user_instances to max int and let the memcg limits of the workloads limit their inotify usage. > Are you concerned of users depleting system memory by creating > userns's and allocating 128 * (struct fsnotify_group) at a time? > > IMO, that is not what max_user_instances was meant to protect against. > There are two reasons I can think of to limit user instances: > 1. Pre-memgc, user allocation of events is limited to > <max_user_instances>*<max_queued_events> > 2. Performance penalty. User can place <max_user_instances> > watches on the same "hot" directory, that will cause any access to > that directory by any task on the system to pay the penalty of traversing > <max_user_instances> marks and attempt to queue <max_user_instances> > events. That cost, including <max_user_instances> inotify_merge() loops > could be significant > > #1 is not a problem anymore, since you already took care of accounting events > to the user's memcg. > #2 is not addressed by your patch. Yes, I am not addressing #2. Our workloads in prod have their own private filesystems, so this is not an issue we observed. > > > > > Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > > --- > > fs/notify/group.c | 14 ++++++++++++-- > > fs/notify/inotify/inotify_user.c | 5 +++-- > > include/linux/fsnotify_backend.h | 2 ++ > > 3 files changed, 17 insertions(+), 4 deletions(-) > > > > diff --git a/fs/notify/group.c b/fs/notify/group.c > > index a4a4b1c64d32..fab3cfdb4d9e 100644 > > --- a/fs/notify/group.c > > +++ b/fs/notify/group.c > > @@ -114,11 +114,12 @@ EXPORT_SYMBOL_GPL(fsnotify_put_group); > > /* > > * Create a new fsnotify_group and hold a reference for the group returned. > > */ > > -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) > > +struct fsnotify_group *fsnotify_alloc_group_gfp(const struct fsnotify_ops *ops, > > + gfp_t gfp) > > { > > struct fsnotify_group *group; > > > > - group = kzalloc(sizeof(struct fsnotify_group), GFP_KERNEL); > > + group = kzalloc(sizeof(struct fsnotify_group), gfp); > > if (!group) > > return ERR_PTR(-ENOMEM); > > > > @@ -139,6 +140,15 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) > > > > return group; > > } > > +EXPORT_SYMBOL_GPL(fsnotify_alloc_group_gfp); > > + > > +/* > > + * Create a new fsnotify_group and hold a reference for the group returned. > > + */ > > +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) > > +{ > > + return fsnotify_alloc_group_gfp(ops, GFP_KERNEL); > > +} > > EXPORT_SYMBOL_GPL(fsnotify_alloc_group); > > > > int fsnotify_fasync(int fd, struct file *file, int on) > > diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c > > index 59c177011a0f..7cb528c6154c 100644 > > --- a/fs/notify/inotify/inotify_user.c > > +++ b/fs/notify/inotify/inotify_user.c > > @@ -632,11 +632,12 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) > > struct fsnotify_group *group; > > struct inotify_event_info *oevent; > > > > - group = fsnotify_alloc_group(&inotify_fsnotify_ops); > > + group = fsnotify_alloc_group_gfp(&inotify_fsnotify_ops, > > + GFP_KERNEL_ACCOUNT); > > if (IS_ERR(group)) > > return group; > > > > - oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL); > > + oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL_ACCOUNT); > > if (unlikely(!oevent)) { > > fsnotify_destroy_group(group); > > return ERR_PTR(-ENOMEM); > > Any reason why you did not include fanotify in this patch? The motivation was inotify's max_user_instances but we can charge fsnotify_group for fanotify as well. Though I would prefer that to be a separate patch. Let me know what you prefer? > > Thanks, > Amir. Thanks for the review. I really appreciate your time. Shakeel