Re: [PATCH 5/7] fanotify: limit number of event merge attempts

Jan Kara <jack@xxxxxxx> · Wed, 15 Sep 2021 18:33:34 +0200

On Wed 15-09-21 15:39:13, Amir Goldstein wrote:
> On Mon, Mar 1, 2021 at 3:08 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Sat 27-02-21 10:31:52, Amir Goldstein wrote:
> > > On Tue, Feb 2, 2021 at 6:20 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > Event merges are expensive when event queue size is large.
> > > > Limit the linear search to 128 merge tests.
> > > > In combination with 128 hash lists, there is a potential to
> > > > merge with up to 16K events in the hashed queue.
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > > ---
> > > >  fs/notify/fanotify/fanotify.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> > > > index 12df6957e4d8..6d3807012851 100644
> > > > --- a/fs/notify/fanotify/fanotify.c
> > > > +++ b/fs/notify/fanotify/fanotify.c
> > > > @@ -129,11 +129,15 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn,
> > > >         return false;
> > > >  }
> > > >
> > > > +/* Limit event merges to limit CPU overhead per event */
> > > > +#define FANOTIFY_MAX_MERGE_EVENTS 128
> > > > +
> > > >  /* and the list better be locked by something too! */
> > > >  static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
> > > >  {
> > > >         struct fsnotify_event *test_event;
> > > >         struct fanotify_event *new;
> > > > +       int i = 0;
> > > >
> > > >         pr_debug("%s: list=%p event=%p\n", __func__, list, event);
> > > >         new = FANOTIFY_E(event);
> > > > @@ -147,6 +151,8 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
> > > >                 return 0;
> > > >
> > > >         list_for_each_entry_reverse(test_event, list, list) {
> > > > +               if (++i > FANOTIFY_MAX_MERGE_EVENTS)
> > > > +                       break;
> > > >                 if (fanotify_should_merge(test_event, event)) {
> > > >                         FANOTIFY_E(test_event)->mask |= new->mask;
> > > >                         return 1;
> > > > --
> > > > 2.25.1
> > > >
> > >
> > > Jan,
> > >
> > > I was thinking that this patch or a variant thereof should be applied to stable
> > > kernels, but not the entire series.
> > >
> > > OTOH, I am concerned about regressing existing workloads that depend on
> > > merging events on more than 128 inodes.
> >
> > Honestly, I don't think pushing anything to stable for this is really worth
> > it.
> >
> > 1) fanotify() is limited to CAP_SYS_ADMIN (in init namespace) so this is
> > hardly a security issue.
> >
> > 2) We have cond_resched() in the merge code now so the kernel doesn't
> > lockup anymore. So this is only about fanotify becoming slow if you have
> > lots of events.
> >
> > 3) I haven't heard any complaints since we've added the cond_resched()
> > patch so the performance issue seems to be really rare.
> >
> > If I get complaits from real users about this, we can easily reconsider, it
> > is not a big deal. But I just don't think preemptive action is warranted...
> >
> 
> Hi Jan,
> 
> I know you have some catching up to do, but applying this patch to stable
> has become a priority for me.
> It was a mistake on my part not to push harder 6 months ago, so I am trying
> to rectify this mistake now as soon as possible.
> 
> To answer your arguments against preemptive action:
> 1) My application has CAP_SYS_ADMIN, it is not malicious, but it cannot
>     do its job without taking up more CPU that it needs to, because bursts of
>     events will cause the event queue to grow to thousands of events and
>     fanotify_merge() will become a high CPU consumer
> 2) It's not only about fanotify becoming slow, it's about fanotify making the
>      entire system slow and as a result it takes a long time for the system
>      to recover from this condition
> 3) You haven't heard any complains because nobody was using sb mark
>     We have been using sb mark in production for a few years and carry
>     this patch in our kernel, so I can say for certain that sb mark on a fs with
>     heavy workload is disturbing the entire system without this patch.
> 
> I don't think that "regressing" the number of merged event is a big issue,
> as we never guaranteed any specific merge behavior and the behavior
> was hard enough to predict, so I don't think applications could have
> relied on it.
> 
> So, are you ok with me sending this patch to stable as is?

Sure, go ahead. I was not strongly against pushing this to stable, I just
didn't see good reason to do that but your arguments make sense - you count
as a user report I was waiting for ;).

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR