Re: [bpf-next v3 2/2] selftests/bpf: add test case for htab map

Tonghao Zhang <xiangxia.m.yue@xxxxxxxxx> · Wed, 4 Jan 2023 22:32:06 +0800



On Wed, Jan 4, 2023 at 4:01 PM Yonghong Song <yhs@xxxxxxxx> wrote:
>
>
>
> On 1/3/23 11:51 PM, Hou Tao wrote:
> > Hi,
> >
> > On 1/4/2023 3:09 PM, Yonghong Song wrote:
> >>
> >>
> >> On 1/2/23 6:40 PM, Tonghao Zhang wrote:
> >>>    a
> >>>
> >>> On Thu, Dec 29, 2022 at 2:29 PM Yonghong Song <yhs@xxxxxxxx> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/28/22 2:24 PM, Alexei Starovoitov wrote:
> >>>>> On Tue, Dec 27, 2022 at 8:43 PM Yonghong Song <yhs@xxxxxxxx> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 12/18/22 8:15 PM, xiangxia.m.yue@xxxxxxxxx wrote:
> >>>>>>> From: Tonghao Zhang <xiangxia.m.yue@xxxxxxxxx>
> >>>>>>>
> >>>>>>> This testing show how to reproduce deadlock in special case.
> >>>>>>> We update htab map in Task and NMI context. Task can be interrupted by
> >>>>>>> NMI, if the same map bucket was locked, there will be a deadlock.
> >>>>>>>
> >>>>>>> * map max_entries is 2.
> >>>>>>> * NMI using key 4 and Task context using key 20.
> >>>>>>> * so same bucket index but map_locked index is different.
> >>>>>>>
> >>>>>>> The selftest use perf to produce the NMI and fentry nmi_handle.
> >>>>>>> Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update
> >>>>>>> map syscall increase this counter in bpf_disable_instrumentation.
> >>>>>>> Then fentry nmi_handle and update hash map will reproduce the issue.
> > SNIP
> >>>>>>> diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c
> >>>>>>> b/tools/testing/selftests/bpf/progs/htab_deadlock.c
> >>>>>>> new file mode 100644
> >>>>>>> index 000000000000..d394f95e97c3
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c
> >>>>>>> @@ -0,0 +1,32 @@
> >>>>>>> +// SPDX-License-Identifier: GPL-2.0
> >>>>>>> +/* Copyright (c) 2022 DiDi Global Inc. */
> >>>>>>> +#include <linux/bpf.h>
> >>>>>>> +#include <bpf/bpf_helpers.h>
> >>>>>>> +#include <bpf/bpf_tracing.h>
> >>>>>>> +
> >>>>>>> +char _license[] SEC("license") = "GPL";
> >>>>>>> +
> >>>>>>> +struct {
> >>>>>>> +     __uint(type, BPF_MAP_TYPE_HASH);
> >>>>>>> +     __uint(max_entries, 2);
> >>>>>>> +     __uint(map_flags, BPF_F_ZERO_SEED);
> >>>>>>> +     __type(key, unsigned int);
> >>>>>>> +     __type(value, unsigned int);
> >>>>>>> +} htab SEC(".maps");
> >>>>>>> +
> >>>>>>> +/* nmi_handle on x86 platform. If changing keyword
> >>>>>>> + * "static" to "inline", this prog load failed. */
> >>>>>>> +SEC("fentry/nmi_handle")
> >>>>>>
> >>>>>> The above comment is not what I mean. In arch/x86/kernel/nmi.c,
> >>>>>> we have
> >>>>>>       static int nmi_handle(unsigned int type, struct pt_regs *regs)
> >>>>>>       {
> >>>>>>            ...
> >>>>>>       }
> >>>>>>       ...
> >>>>>>       static noinstr void default_do_nmi(struct pt_regs *regs)
> >>>>>>       {
> >>>>>>            ...
> >>>>>>            handled = nmi_handle(NMI_LOCAL, regs);
> >>>>>>            ...
> >>>>>>       }
> >>>>>>
> >>>>>> Since nmi_handle is a static function, it is possible that
> >>>>>> the function might be inlined in default_do_nmi by the
> >>>>>> compiler. If this happens, fentry/nmi_handle will not
> >>>>>> be triggered and the test will pass.
> >>>>>>
> >>>>>> So I suggest to change the comment to
> >>>>>>       nmi_handle() is a static function and might be
> >>>>>>       inlined into its caller. If this happens, the
> >>>>>>       test can still pass without previous kernel fix.
> >>>>>
> >>>>> It's worse than this.
> >>>>> fentry is buggy.
> >>>>> We shouldn't allow attaching fentry to:
> >>>>> NOKPROBE_SYMBOL(nmi_handle);
> >>>>
> >>>> Okay, I see. Looks we should prevent fentry from
> >>>> attaching any NOKPROBE_SYMBOL functions.
> >>>>
> >>>> BTW, I think fentry/nmi_handle can be replaced with
> >>>> tracepoint nmi/nmi_handler. it is more reliable
> >>> The tracepoint will not reproduce the deadlock(we have discussed v2).
> >>> If it's not easy to complete a test for this case, should we drop this
> >>> testcase patch? or fentry the nmi_handle and update the comments.
> >>
> >> could we use a softirq perf event (timer), e.g.,
> >>
> >>          struct perf_event_attr attr = {
> >>                  .sample_period = 1,
> >>                  .type = PERF_TYPE_SOFTWARE,
> >>                  .config = PERF_COUNT_SW_CPU_CLOCK,
> >>          };
> >>
> >> then you can attach function hrtimer_run_softirq (not tested) or
> >> similar functions?
> > The context will be a hard-irq context, right ? Because htab_lock_bucket() has
> > already disabled hard-irq on current CPU, so the dead-lock will be impossible.
>
> Okay, I see. soft-irq doesn't work. The only thing it works is nmi since
> it is non-masking.
>
> >>
> >> I suspect most (if not all) functions in nmi path cannot
> >> be kprobe'd.
> > It seems that perf_event_nmi_handler() is also nokprobe function. However I
> > think we could try its callees (e.g., x86_pmu_handle_irq or perf_event_overflow).
>
> If we can find a function in nmi handling path which is not marked as
> nonkprobe, sure, we can use that function for fentry.
I think perf_event_overflow is ok.
[   93.233093]  dump_stack_lvl+0x57/0x81
[   93.233098]  lock_acquire+0x1f4/0x29a
[   93.233103]  ? htab_lock_bucket+0x61/0x6c
[   93.233108]  _raw_spin_lock_irqsave+0x43/0x7f
[   93.233111]  ? htab_lock_bucket+0x61/0x6c
[   93.233114]  htab_lock_bucket+0x61/0x6c
[   93.233118]  htab_map_update_elem+0x11e/0x220
[   93.233124]  bpf_prog_df326439468c24a9_bpf_prog1+0x41/0x45
[   93.233137]  bpf_trampoline_6442478975_0+0x48/0x1000
[   93.233144]  perf_event_overflow+0x5/0x15
[   93.233149]  handle_pmi_common+0x1ad/0x1f0
[   93.233166]  intel_pmu_handle_irq+0x136/0x18a
[   93.233170]  perf_event_nmi_handler+0x28/0x47
[   93.233176]  nmi_handle+0xb8/0x254
[   93.233182]  default_do_nmi+0x3d/0xf6
[   93.233187]  exc_nmi+0xa1/0x109
[   93.233191]  end_repeat_nmi+0x16/0x67
> >>
> >>>> and won't be impacted by potential NOKPROBE_SYMBOL
> >>>> issues.
> >>>
> >>>
> >>>
> >> .
> >


-- 
Best regards, Tonghao