On Mon, Mar 1, 2021 at 3:08 PM Jan Kara <jack@xxxxxxx> wrote: > > On Sat 27-02-21 10:31:52, Amir Goldstein wrote: > > On Tue, Feb 2, 2021 at 6:20 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > Event merges are expensive when event queue size is large. > > > Limit the linear search to 128 merge tests. > > > In combination with 128 hash lists, there is a potential to > > > merge with up to 16K events in the hashed queue. > > > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > --- > > > fs/notify/fanotify/fanotify.c | 6 ++++++ > > > 1 file changed, 6 insertions(+) > > > > > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > > > index 12df6957e4d8..6d3807012851 100644 > > > --- a/fs/notify/fanotify/fanotify.c > > > +++ b/fs/notify/fanotify/fanotify.c > > > @@ -129,11 +129,15 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, > > > return false; > > > } > > > > > > +/* Limit event merges to limit CPU overhead per event */ > > > +#define FANOTIFY_MAX_MERGE_EVENTS 128 > > > + > > > /* and the list better be locked by something too! */ > > > static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) > > > { > > > struct fsnotify_event *test_event; > > > struct fanotify_event *new; > > > + int i = 0; > > > > > > pr_debug("%s: list=%p event=%p\n", __func__, list, event); > > > new = FANOTIFY_E(event); > > > @@ -147,6 +151,8 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) > > > return 0; > > > > > > list_for_each_entry_reverse(test_event, list, list) { > > > + if (++i > FANOTIFY_MAX_MERGE_EVENTS) > > > + break; > > > if (fanotify_should_merge(test_event, event)) { > > > FANOTIFY_E(test_event)->mask |= new->mask; > > > return 1; > > > -- > > > 2.25.1 > > > > > > > Jan, > > > > I was thinking that this patch or a variant thereof should be applied to stable > > kernels, but not the entire series. > > > > OTOH, I am concerned about regressing existing workloads that depend on > > merging events on more than 128 inodes. > > Honestly, I don't think pushing anything to stable for this is really worth > it. > > 1) fanotify() is limited to CAP_SYS_ADMIN (in init namespace) so this is > hardly a security issue. > > 2) We have cond_resched() in the merge code now so the kernel doesn't > lockup anymore. So this is only about fanotify becoming slow if you have > lots of events. > > 3) I haven't heard any complaints since we've added the cond_resched() > patch so the performance issue seems to be really rare. > > If I get complaits from real users about this, we can easily reconsider, it > is not a big deal. But I just don't think preemptive action is warranted... > Hi Jan, I know you have some catching up to do, but applying this patch to stable has become a priority for me. It was a mistake on my part not to push harder 6 months ago, so I am trying to rectify this mistake now as soon as possible. To answer your arguments against preemptive action: 1) My application has CAP_SYS_ADMIN, it is not malicious, but it cannot do its job without taking up more CPU that it needs to, because bursts of events will cause the event queue to grow to thousands of events and fanotify_merge() will become a high CPU consumer 2) It's not only about fanotify becoming slow, it's about fanotify making the entire system slow and as a result it takes a long time for the system to recover from this condition 3) You haven't heard any complains because nobody was using sb mark We have been using sb mark in production for a few years and carry this patch in our kernel, so I can say for certain that sb mark on a fs with heavy workload is disturbing the entire system without this patch. I don't think that "regressing" the number of merged event is a big issue, as we never guaranteed any specific merge behavior and the behavior was hard enough to predict, so I don't think applications could have relied on it. So, are you ok with me sending this patch to stable as is? Thanks, Amir.