Re: [RFC PATCH bpf-next] bpf: Introduce bpf_timer

Jamal Hadi Salim <jhs@xxxxxxxxxxxx> · Wed, 26 May 2021 14:25:28 -0400

On 2021-05-26 12:58 p.m., Alexei Starovoitov wrote:
On Wed, May 26, 2021 at 11:34:04AM -0400, Jamal Hadi Salim wrote:
On 2021-05-25 6:08 p.m., Alexei Starovoitov wrote:
On Tue, May 25, 2021 at 2:09 PM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote:

Didnt follow why this wouldnt work in the same way for Array?

array doesn't have delete.

Ok. But even for arrays if userspace for example does update
of an existing entry we should be able to invoke callback, no?

One interesting concept i see come out of this is emulating
netlink-like event generation towards user space i.e a user
space app listening to changes to a map.

Folks do it already via ringbuf events. No need for update/delete
callback to implement such notifications.

Please bear with me:
I know it is trivial to do if you are in control of the kernel
side if your prog creates/updates/deletes map entries. Ive done
it many times with perf event arrays (before ringbuf existed).
But:
What i was referring to is if another entity altogether
(possibly not under your control) was to make that change
from the kernel side then you dont get to know. Same with a
user space program doing a write to the map entry.

If you say this can be done then please do me a kindness and point
me to someone already doing this or some sample code.

would like to hear what the proposed ideas are.
I see this as a tricky problem to solve - you can make LRU
programmable to allow the variety of LRU replacement algos out
there but not all encompansing for custom or other types of algos.
The problem remains that LRU is very specific to evicting
entries that are least used. I can imagine that if i wanted to
do a LIFO aging for example then it can be done with some acrobatics
as an overlay on top of LRU with all sorts of tweaking.
It is sort of fitting a square peg into a round hole - you can do
it, but why the torture when you have a flexible architecture.

Using GC to solve 'hash table is running out of memory' problem is
exactly the square peg.
Timers is absolutely wrong way to address memory pressure.

We need to provide the mechanisms (I dont see a disagreement on
need for timers at least).

It's an explicit non-goal for timer api to be used as GC for conntrack.

Agreed.

You'll be able to use it as such, but when it fails to scale
(as it's going to happen with any timer implementation) don't blame
infrastructure for that.

Agreed again. Timers are a necessary part of the toolset.
I hope i was reading as claiming that just firing random
timers equates to gc or that on its own will scale.

A reasonable approach is to let the policy be defined
from user space. I may want the timer to keep polling
a map that is not being updated until the next program
restarts and starts updating it.
I thought Cong's approach with timerids/maps was a good
way to achieve control.

No, it's not a policy, and no, it doesn't belong to user space,
and no, Cong's approach has nothing to do with this design choice.

You listed 3 possibilities of what could happen in the use case
i described. One person's meat is another person's poison.
i.e it is about design choice. What i meant by policy is
whether intentionaly or not, Cong's approach had the user able to
control what happens to the timer.

cheers,
jamal