Hi! A complete newbie here, hope this is not a straight out nonsense.
For every successful epoll_ctl(EPOLL_CTL_ADD/EPOLL_CTL_DEL) kernel
allocates/de-allocates the corresponding epitem.
In a presumably widespread case (libev even has a trick to optimize this
exact case) of level-triggered behavior and a epoll_ctl(DEL) +
epoll_ctl(ADD) dance per read cycle this leads to tossing epitem back
and forth to the slab cache allocator.
In my completely synthetic user-space benchmark of just doing
epoll_ctl(ADD) + epoll_ctl(DEL) in a loop these [de]allocations are
responsible for 25% of CPU time.
What if instead of unconditionally [de]allocating an epitem, we try to
reuse its memory by preserving a single "up for grabs" epitem in the
eventpoll struct, which epoll_ctl(ADD) would try to acquire and
epoll_ctl(DEL) would update? A single-item cache before the allocator,
if you will.
I see (and I might be missing A LOT) the following pros/cons:
Pros:
* allocations reduced by a varying percentage, from 0% to 100%,
depending on the usage.
Cons:
* sizeof(epitem) + sizeof(epitem*) memory overhead per epoll instance
* a slight code complication
* we lose whatever locality the allocator gives
Does this make any sense, and if so do the pros outweigh the cons?