Re: [RFC PATCH bpf-next] bpf: Introduce bpf_timer

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 26 May 2021 09:58:47 -0700

On Wed, May 26, 2021 at 11:34:04AM -0400, Jamal Hadi Salim wrote:
> On 2021-05-25 6:08 p.m., Alexei Starovoitov wrote:
> > On Tue, May 25, 2021 at 2:09 PM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote:
> > > 
> 
> > > This is certainly a useful feature (for other reasons as well).
> > > Does this include create/update/delete issued from user space?
> > 
> > Right. Any kind of update/delete and create is a subset of update.
> > The lookup is not included (yet or may be ever) since it doesn't
> > have deterministic start/end points.
> > The prog can do a lookup and update values in place while
> > holding on the element until prog execution ends.
> > 
> > While update/delete have precise points in hash/lru/lpm maps.
> > Array is a different story.
> > 
> 
> Didnt follow why this wouldnt work in the same way for Array?

array doesn't have delete.

> One interesting concept i see come out of this is emulating
> netlink-like event generation towards user space i.e a user
> space app listening to changes to a map.

Folks do it already via ringbuf events. No need for update/delete
callback to implement such notifications.

> > > 
> > > The challenge we have in this case is LRU makes the decision
> > > which entry to victimize. We do have some entries we want to
> > > keep longer - even if they are not seeing a lot of activity.
> > 
> > Right. That's certainly an argument to make LRU eviction
> > logic programmable.
> > John/Joe/Daniel proposed it as a concept long ago.
> > Design ideas are in demand to make further progress here :)
> > 
> 
> would like to hear what the proposed ideas are.
> I see this as a tricky problem to solve - you can make LRU
> programmable to allow the variety of LRU replacement algos out
> there but not all encompansing for custom or other types of algos.
> The problem remains that LRU is very specific to evicting
> entries that are least used. I can imagine that if i wanted to
> do a LIFO aging for example then it can be done with some acrobatics
> as an overlay on top of LRU with all sorts of tweaking.
> It is sort of fitting a square peg into a round hole - you can do
> it, but why the torture when you have a flexible architecture.

Using GC to solve 'hash table is running out of memory' problem is
exactly the square peg.
Timers is absolutely wrong way to address memory pressure.

> We need to provide the mechanisms (I dont see a disagreement on
> need for timers at least).

It's an explicit non-goal for timer api to be used as GC for conntrack.
You'll be able to use it as such, but when it fails to scale
(as it's going to happen with any timer implementation) don't blame
infrastructure for that.

> > > 
> > > What happens when both ingress and egress are ejected?
> > 
> > What is 'ejected'? Like a CD? ;)
> 
> I was going to use other verbs to describe this; but
> may have sounded obscene ;->

Please use standard terminology. The topic is difficult enough
to understand without inventing new words.

> > The kernel can choose to do different things with the timer here.
> > One option is to cancel the outstanding timers and unload
> > .text where the timer callback lives
> >
> > Another option is to let the timer stay armed and auto unload
> > .text of bpf function when it finishes executing.
> >
> > If timer callback decides to re-arm itself it can continue
> > executing indefinitely.
> > This patch is doing the latter.
> > There could be a combination of both options.
> > All options have their pros/cons.
> 
> A reasonable approach is to let the policy be defined
> from user space. I may want the timer to keep polling
> a map that is not being updated until the next program
> restarts and starts updating it.
> I thought Cong's approach with timerids/maps was a good
> way to achieve control.

No, it's not a policy, and no, it doesn't belong to user space,
and no, Cong's approach has nothing to do with this design choice.