Re: fanotify_merge improvements

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 28 Jan 2021 20:50:27 +0200

On Thu, Jan 28, 2021 at 12:27 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Wed 27-01-21 20:03:23, Amir Goldstein wrote:
> > On Wed, Jan 27, 2021 at 5:15 PM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Wed 27-01-21 14:57:56, Amir Goldstein wrote:
> > > > On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@xxxxxxx> wrote:
> > > > > > - With multi queue, high bit of obejctid will be masked for merge compare.
> > > > > > - Instead, they will be used to store the next_qid to read from
> > > > > >
> > > > > > For example:
> > > > > > - event #1 is added to queue 6
> > > > > > - set group->last_qid = 6
> > > > > > - set group->next_qid = 6 (because group->num_events == 1)
> > > > > > - event #2 is added to queue 13
> > > > > > - the next_qid bits of the last event in last_qid (6) queue are set to 13
> > > > > > - set group->last_qid = 13
> > > > > >
> > > > > > - read() checks value of group->next_qid and reads the first event
> > > > > > from queue 6 (event #1)
> > > > > > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13
> > > > > > - read() reads first event from queue 13 (event #2)
> > > > >
> > > > > That's an interesting idea. I like it and I think it would work. Just
> > > > > instead of masking, I'd use bitfields. Or we could just restrict objectid
> > > > > to 32-bits and use remaining 32-bits for the next_qid pointer. I know it
> > > > > will waste some bits but 32-bits of objectid should provide us with enough
> > > > > space to avoid doing full event comparison in most cases
> > > >
> > > > Certainly.
> > > > The entire set of objects to compare is going to be limited to 128*128,
> > > > so 32bit should be plenty of hash bits.
> > > > Simplicity is preferred.
> > > >
> > > > >  - BTW WRT naming I
> > > > > find 'qid' somewhat confusing. Can we call it say 'next_bucket' or
> > > > > something like that?
> > > > >
> > > >
> > > > Sure. If its going to be 32bit, I can just call it next_key for simplicity
> > > > and store the next event key instead of the next event bucket.
> > > >
> > > > > > Permission events require special care, but that is the idea of a simple
> > > > > > singly linked list using qid's for reading events by insert order and
> > > > > > merging by hashed queue.
> > > > >
> > > > > Why are permission events special in this regard?
> > > > >
> > > >
> > > > They are not removed from the head of the queue, so
> > > > middle event next_key may need to be updated when they
> > > > are removed.
> > >
> > > Oh, you mean the special case when we receive a signal and thus remove
> > > permission event from a notification queue? I forgot about that one and
> > > yes, it needs a special handling...
> > >
> > > > I guess since permission events are not merged, they could
> > > > use their own queue. If we do not care about ordering of
> > > > permission events and non-permission events, we can treat this
> > > > as a priority queue and it will simplify things considerably.
> > > > Boosting priority of blocking hooks seems like the right thing to do.
> > > > I wonder if we could make that change?
> > >
> > > Yes, permission events are not merged and I'm not aware of any users
> > > actually mixing permission and other events in a notification group. OTOH
> > > I'm somewhat reluctant to reorder events that much. It could break
> > > someone, it could starve notification events, etc. AFAIU the pain with
> > > permission events is updating the ->next_key field in case we want to remove
> > > unreported permission event. Finding previous entry with this scheme is
> > > indeed somewhat painful (we'd have to walk the queue which requires
> > > maintaining 'cur' pointer for every queue). So maybe growing fsnotify_event
> > > by one pointer to contain single linked list for a hash chain would be
> > > simplest in the end? Then removing from the hash chain in the corner case of
> > > tearing permission event out is simple enough...
> > >
> >
> > Better to disable the multi queue for the very uninteresting corner case (mixing
> > permissions and non permissions) . The simplest thing to do is to enable multi
> > queue only for FAN_CLASS_NOTIF. I suppose users do not use high priority
> > classes for non-permission event use case and if they do, they will get less
> > merged events - no big deal.
> >
> > The important things we get are:
> > 1. Code remains simple
> > 2. Deterministic CPU usage (linear search is capped to 128 events)
> > 3. In the most common use case of async change listener we can merge
> >     events on up to 16K unique objects which should be sufficient
> >
> > I'll try to write this up.
>
> OK, sounds fine to me. We can always reconsider if some users come back
> complaining...
>

Pushed that to fanotify_merge branch.
It's even working ;-)

Please let me know if this looks fine so far and I will complete the rest,
test performance and post the patches.

Remaining:
- Let event->key be a hash of all merge compared fields
- Limit linear search per chain

Thanks,
Amir.