On Thu, Jan 28, 2021 at 12:27 PM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 27-01-21 20:03:23, Amir Goldstein wrote: > > On Wed, Jan 27, 2021 at 5:15 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > On Wed 27-01-21 14:57:56, Amir Goldstein wrote: > > > > On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > - With multi queue, high bit of obejctid will be masked for merge compare. > > > > > > - Instead, they will be used to store the next_qid to read from > > > > > > > > > > > > For example: > > > > > > - event #1 is added to queue 6 > > > > > > - set group->last_qid = 6 > > > > > > - set group->next_qid = 6 (because group->num_events == 1) > > > > > > - event #2 is added to queue 13 > > > > > > - the next_qid bits of the last event in last_qid (6) queue are set to 13 > > > > > > - set group->last_qid = 13 > > > > > > > > > > > > - read() checks value of group->next_qid and reads the first event > > > > > > from queue 6 (event #1) > > > > > > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13 > > > > > > - read() reads first event from queue 13 (event #2) > > > > > > > > > > That's an interesting idea. I like it and I think it would work. Just > > > > > instead of masking, I'd use bitfields. Or we could just restrict objectid > > > > > to 32-bits and use remaining 32-bits for the next_qid pointer. I know it > > > > > will waste some bits but 32-bits of objectid should provide us with enough > > > > > space to avoid doing full event comparison in most cases > > > > > > > > Certainly. > > > > The entire set of objects to compare is going to be limited to 128*128, > > > > so 32bit should be plenty of hash bits. > > > > Simplicity is preferred. > > > > > > > > > - BTW WRT naming I > > > > > find 'qid' somewhat confusing. Can we call it say 'next_bucket' or > > > > > something like that? > > > > > > > > > > > > > Sure. If its going to be 32bit, I can just call it next_key for simplicity > > > > and store the next event key instead of the next event bucket. > > > > > > > > > > Permission events require special care, but that is the idea of a simple > > > > > > singly linked list using qid's for reading events by insert order and > > > > > > merging by hashed queue. > > > > > > > > > > Why are permission events special in this regard? > > > > > > > > > > > > > They are not removed from the head of the queue, so > > > > middle event next_key may need to be updated when they > > > > are removed. > > > > > > Oh, you mean the special case when we receive a signal and thus remove > > > permission event from a notification queue? I forgot about that one and > > > yes, it needs a special handling... > > > > > > > I guess since permission events are not merged, they could > > > > use their own queue. If we do not care about ordering of > > > > permission events and non-permission events, we can treat this > > > > as a priority queue and it will simplify things considerably. > > > > Boosting priority of blocking hooks seems like the right thing to do. > > > > I wonder if we could make that change? > > > > > > Yes, permission events are not merged and I'm not aware of any users > > > actually mixing permission and other events in a notification group. OTOH > > > I'm somewhat reluctant to reorder events that much. It could break > > > someone, it could starve notification events, etc. AFAIU the pain with > > > permission events is updating the ->next_key field in case we want to remove > > > unreported permission event. Finding previous entry with this scheme is > > > indeed somewhat painful (we'd have to walk the queue which requires > > > maintaining 'cur' pointer for every queue). So maybe growing fsnotify_event > > > by one pointer to contain single linked list for a hash chain would be > > > simplest in the end? Then removing from the hash chain in the corner case of > > > tearing permission event out is simple enough... > > > > > > > Better to disable the multi queue for the very uninteresting corner case (mixing > > permissions and non permissions) . The simplest thing to do is to enable multi > > queue only for FAN_CLASS_NOTIF. I suppose users do not use high priority > > classes for non-permission event use case and if they do, they will get less > > merged events - no big deal. > > > > The important things we get are: > > 1. Code remains simple > > 2. Deterministic CPU usage (linear search is capped to 128 events) > > 3. In the most common use case of async change listener we can merge > > events on up to 16K unique objects which should be sufficient > > > > I'll try to write this up. > > OK, sounds fine to me. We can always reconsider if some users come back > complaining... > Pushed that to fanotify_merge branch. It's even working ;-) Please let me know if this looks fine so far and I will complete the rest, test performance and post the patches. Remaining: - Let event->key be a hash of all merge compared fields - Limit linear search per chain Thanks, Amir.