On 2021-05-11 5:29 p.m., Cong Wang wrote:
On Mon, May 10, 2021 at 1:55 PM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote:
That cilium PR was a good read of the general issues.
Our use case involves anywhere between 4-16M cached entries.
Like i mentioned earlier:
we want to periodically, if some condition is met in the
kernel on a map entry, to cleanup, update or send unsolicited
housekeeping events to user space.
Polling in order to achieve this for that many entries is expensive.
Thanks for sharing your use case. As we discussed privately, please
also share the performance numbers you have.
The earlier tests i mentioned to you were in regards to LRU.
I can share those as well - but seems for what we are discussing
here testing cost of batch vs nobatch is more important.
Our LRU tests indicate that it is better to use global as opposed
to per-CPU LRU. We didnt dig deeper but it seemed gc/alloc - which was
happening under some lock gets very expensive regardless if you
are sending sufficient number of flows/sec (1M flows/sec in our
case).
We cannot use LRU (for reasons stated earlier). It has to be hash
table with aging under our jurisdiction. I will post numbers for
sending the entries to user space for gc.
cheers,
jamal