Re: [PATCH 1/3] bpf: Disable preemption when increasing per-cpu map_locked

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Tue, 23 Aug 2022 10:54:20 +0800



Hi,

On 8/23/2022 8:56 AM, Hao Luo wrote:
> On Mon, Aug 22, 2022 at 11:01 AM Hao Luo <haoluo@xxxxxxxxxx> wrote:
>> On Mon, Aug 22, 2022 at 5:08 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>> Hi,
>>>
>>> On 8/22/2022 11:21 AM, Hao Luo wrote:
>>>> Hi, Hou Tao
>>>>
>>>> On Sun, Aug 21, 2022 at 6:28 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> On 8/22/2022 12:42 AM, Hao Luo wrote:
>>>>>> Hi Hou Tao,
>>>>>>
>>>>>> On Sat, Aug 20, 2022 at 8:14 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>>>>>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>>>>>>
>>>>>>> Per-cpu htab->map_locked is used to prohibit the concurrent accesses
>>>>>>> from both NMI and non-NMI contexts. But since 74d862b682f51 ("sched:
>>>>>>> Make migrate_disable/enable() independent of RT"), migrations_disable()
>>>>>>> is also preemptible under !PREEMPT_RT case, so now map_locked also
>>>>>>> disallows concurrent updates from normal contexts (e.g. userspace
>>>>>>> processes) unexpectedly as shown below:
>>>>>>>
>>>>>>> process A                      process B
>>>>>>>
SNIP
>>>>>>> First please enable CONFIG_PREEMPT for the running kernel and then run the
>>>>>>> following test program as shown below.
>>>>>>>
>> Ah, fully preemptive kernel. It's worth mentioning that in the commit
>> message. Then it seems promoting migrate_disable to preempt_disable
>> may be the best way to solve the problem you described.
Sorry, i missed that. Will add in v2.
>>
>>> # sudo taskset -c 2 ./update.bin
>>> thread nr 2
>>> wait for error
>>> update error -16
>>> all threads exit
>>>
>>> If there is no "update error -16", you can try to create more map update
>>> threads. For example running 16 update threads:
>>>
>>> # sudo taskset -c 2 ./update.bin 16
>>> thread nr 16
>>> wait for error
>>> update error -16
>>> update error -16
>>> update error -16
>>> update error -16
>>> update error -16
>>> update error -16
>>> update error -16
>>> update error -16
>>> all threads exit
SNIP
> Tao, thanks very much for the test. I played it a bit and I can
> confirm that map_update failures are seen under CONFIG_PREEMPT. The
> failures are not present under CONFIG_PREEMPT_NONE or
> CONFIG_PREEMPT_VOLUNTARY. I experimented with a few alternatives I was
> thinking of and they didn't work. It looks like Hou Tao's idea,
> promoting migrate_disable to preempt_disable, is probably the best we
> can do for the non-RT case. So
>
> Reviewed-by: Hao Luo <haoluo@xxxxxxxxxx>
>
> But, I am not sure if we want to get rid of preemption-caused batch
> map updates on preemptive kernels, or if there are better solutions to
> address [0].
Thanks for your review.
Under preemptive kernel, if the preemption is disabled during batch map lookup
or update, htab_lock_bucket() from userspace will not fail, so I think it is OK
for now.
>
> Tao, could you mention CONFIG_PREEMPT in your commit message? Thanks.
Will do.
>>>> Here is my theory, but please correct me if I'm wrong, I haven't
>>>> tested yet. In non-RT, I doubt preemptions are likely to happen after
>>>> migrate_disable. That is because very soon after migrate_disable, we
>>>> enter the critical section of b->raw_lock with irq disabled. In RT,
>>>> preemptions can happen on acquiring b->lock, that is certainly
>>>> possible, but this is the !use_raw_lock path, which isn't side-stepped
>>>> by this patch.
>>>>
>>>>>>>  kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
>>>>>>>  1 file changed, 18 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>>>>> index 6c530a5e560a..ad09da139589 100644
>>>>>>> --- a/kernel/bpf/hashtab.c
>>>>>>> +++ b/kernel/bpf/hashtab.c
>>>>>>> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
>>>>>>>                                    unsigned long *pflags)
>>>>>>>  {
>>>>>>>         unsigned long flags;
>>>>>>> +       bool use_raw_lock;
>>>>>>>
>>>>>>>         hash = hash & HASHTAB_MAP_LOCK_MASK;
>>>>>>>
>>>>>>> -       migrate_disable();
>>>>>>> +       use_raw_lock = htab_use_raw_lock(htab);
>>>>>>> +       if (use_raw_lock)
>>>>>>> +               preempt_disable();
>>>>>>> +       else
>>>>>>> +               migrate_disable();
>>>>>>>         if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
>>>>>>>                 __this_cpu_dec(*(htab->map_locked[hash]));
>>>>>>> -               migrate_enable();
>>>>>>> +               if (use_raw_lock)
>>>>>>> +                       preempt_enable();
>>>>>>> +               else
>>>>>>> +                       migrate_enable();
>>>>>>>                 return -EBUSY;
>>>>>>>         }
>>>>>>>
>>>>>>> -       if (htab_use_raw_lock(htab))
>>>>>>> +       if (use_raw_lock)
>>>>>>>                 raw_spin_lock_irqsave(&b->raw_lock, flags);
>>>>>>>         else
>>>>>>>                 spin_lock_irqsave(&b->lock, flags);
>>>>>>> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
>>>>>>>                                       struct bucket *b, u32 hash,
>>>>>>>                                       unsigned long flags)
>>>>>>>  {
>>>>>>> +       bool use_raw_lock = htab_use_raw_lock(htab);
>>>>>>> +
>>>>>>>         hash = hash & HASHTAB_MAP_LOCK_MASK;
>>>>>>> -       if (htab_use_raw_lock(htab))
>>>>>>> +       if (use_raw_lock)
>>>>>>>                 raw_spin_unlock_irqrestore(&b->raw_lock, flags);
>>>>>>>         else
>>>>>>>                 spin_unlock_irqrestore(&b->lock, flags);
>>>>>>>         __this_cpu_dec(*(htab->map_locked[hash]));
>>>>>>> -       migrate_enable();
>>>>>>> +       if (use_raw_lock)
>>>>>>> +               preempt_enable();
>>>>>>> +       else
>>>>>>> +               migrate_enable();
>>>>>>>  }
>>>>>>>
>>>>>>>  static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
>>>>>>> --
>>>>>>> 2.29.2
>>>>>>>
>>>>>> .
>>>> .