On Wed, Jan 27, 2021 at 5:15 PM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 27-01-21 14:57:56, Amir Goldstein wrote: > > On Wed, Jan 27, 2021 at 1:24 PM Jan Kara <jack@xxxxxxx> wrote: > > > > - With multi queue, high bit of obejctid will be masked for merge compare. > > > > - Instead, they will be used to store the next_qid to read from > > > > > > > > For example: > > > > - event #1 is added to queue 6 > > > > - set group->last_qid = 6 > > > > - set group->next_qid = 6 (because group->num_events == 1) > > > > - event #2 is added to queue 13 > > > > - the next_qid bits of the last event in last_qid (6) queue are set to 13 > > > > - set group->last_qid = 13 > > > > > > > > - read() checks value of group->next_qid and reads the first event > > > > from queue 6 (event #1) > > > > - event #1 has 13 stored in next_qid bits so set group->next_qid = 13 > > > > - read() reads first event from queue 13 (event #2) > > > > > > That's an interesting idea. I like it and I think it would work. Just > > > instead of masking, I'd use bitfields. Or we could just restrict objectid > > > to 32-bits and use remaining 32-bits for the next_qid pointer. I know it > > > will waste some bits but 32-bits of objectid should provide us with enough > > > space to avoid doing full event comparison in most cases > > > > Certainly. > > The entire set of objects to compare is going to be limited to 128*128, > > so 32bit should be plenty of hash bits. > > Simplicity is preferred. > > > > > - BTW WRT naming I > > > find 'qid' somewhat confusing. Can we call it say 'next_bucket' or > > > something like that? > > > > > > > Sure. If its going to be 32bit, I can just call it next_key for simplicity > > and store the next event key instead of the next event bucket. > > > > > > Permission events require special care, but that is the idea of a simple > > > > singly linked list using qid's for reading events by insert order and > > > > merging by hashed queue. > > > > > > Why are permission events special in this regard? > > > > > > > They are not removed from the head of the queue, so > > middle event next_key may need to be updated when they > > are removed. > > Oh, you mean the special case when we receive a signal and thus remove > permission event from a notification queue? I forgot about that one and > yes, it needs a special handling... > > > I guess since permission events are not merged, they could > > use their own queue. If we do not care about ordering of > > permission events and non-permission events, we can treat this > > as a priority queue and it will simplify things considerably. > > Boosting priority of blocking hooks seems like the right thing to do. > > I wonder if we could make that change? > > Yes, permission events are not merged and I'm not aware of any users > actually mixing permission and other events in a notification group. OTOH > I'm somewhat reluctant to reorder events that much. It could break > someone, it could starve notification events, etc. AFAIU the pain with > permission events is updating the ->next_key field in case we want to remove > unreported permission event. Finding previous entry with this scheme is > indeed somewhat painful (we'd have to walk the queue which requires > maintaining 'cur' pointer for every queue). So maybe growing fsnotify_event > by one pointer to contain single linked list for a hash chain would be > simplest in the end? Then removing from the hash chain in the corner case of > tearing permission event out is simple enough... > Better to disable the multi queue for the very uninteresting corner case (mixing permissions and non permissions) . The simplest thing to do is to enable multi queue only for FAN_CLASS_NOTIF. I suppose users do not use high priority classes for non-permission event use case and if they do, they will get less merged events - no big deal. The important things we get are: 1. Code remains simple 2. Deterministic CPU usage (linear search is capped to 128 events) 3. In the most common use case of async change listener we can merge events on up to 16K unique objects which should be sufficient I'll try to write this up. Thanks, Amir.