Re: [PATCH bpf v2 7/9] bpf: Use raw_spinlock_t for LPM trie

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 2 Dec 2024 17:42:45 -0800

On Fri, Nov 29, 2024 at 4:18 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Hou Tao <houtao@xxxxxxxxxxxxxxx> writes:
>
> > From: Hou Tao <houtao1@xxxxxxxxxx>
> >
> > After switching from kmalloc() to the bpf memory allocator, there will be
> > no blocking operation during the update of LPM trie. Therefore, change
> > trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in
> > atomic context, even on RT kernels.
> >
> > The max value of prefixlen is 2048. Therefore, update or deletion
> > operations will find the target after at most 2048 comparisons.
> > Constructing a test case which updates an element after 2048 comparisons
> > under a 8 CPU VM, and the average time and the maximal time for such
> > update operation is about 210us and 900us.
>
> That is... quite a long time? I'm not sure we have any guidance on what
> the maximum acceptable time is (perhaps the RT folks can weigh in
> here?), but stalling for almost a millisecond seems long.
>
> Especially doing this unconditionally seems a bit risky; this means that
> even a networking program using the lpm map in the data path can stall
> the system for that long, even if it would have been perfectly happy to
> be preempted.

I don't share this concern.
2048 comparisons is an extreme case.
I'm sure there are a million other ways to stall bpf prog for that long.

> So one option here could be to make it conditional? As in, have a map
> flag (on creation) that switches to raw_spinlock usage, and reject using
> the map from atomic context if that flag is not set?

No. Let's not complicate the LPM map unnecessarily.
I'd rather see it's being rewritten into faster and more efficient
algorithm.