Re: [PATCH bpf v2 7/9] bpf: Use raw_spinlock_t for LPM trie

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Thu, 5 Dec 2024 16:52:46 +0800



Hi,

On 12/3/2024 9:42 AM, Alexei Starovoitov wrote:
> On Fri, Nov 29, 2024 at 4:18 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> Hou Tao <houtao@xxxxxxxxxxxxxxx> writes:
>>
>>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>>
>>> After switching from kmalloc() to the bpf memory allocator, there will be
>>> no blocking operation during the update of LPM trie. Therefore, change
>>> trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in
>>> atomic context, even on RT kernels.
>>>
>>> The max value of prefixlen is 2048. Therefore, update or deletion
>>> operations will find the target after at most 2048 comparisons.
>>> Constructing a test case which updates an element after 2048 comparisons
>>> under a 8 CPU VM, and the average time and the maximal time for such
>>> update operation is about 210us and 900us.
>> That is... quite a long time? I'm not sure we have any guidance on what
>> the maximum acceptable time is (perhaps the RT folks can weigh in
>> here?), but stalling for almost a millisecond seems long.
>>
>> Especially doing this unconditionally seems a bit risky; this means that
>> even a networking program using the lpm map in the data path can stall
>> the system for that long, even if it would have been perfectly happy to
>> be preempted.
> I don't share this concern.
> 2048 comparisons is an extreme case.
> I'm sure there are a million other ways to stall bpf prog for that long.

2048 is indeed an extreme case. I would do some test to check how much
time is used for the normal cases with prefixlen=32 or prefixlen=128.
>
>> So one option here could be to make it conditional? As in, have a map
>> flag (on creation) that switches to raw_spinlock usage, and reject using
>> the map from atomic context if that flag is not set?
> No. Let's not complicate the LPM map unnecessarily.
> I'd rather see it's being rewritten into faster and more efficient
> algorithm.