On Wed, Jan 4, 2023 at 4:01 PM Yonghong Song <yhs@xxxxxxxx> wrote: > > > > On 1/3/23 11:51 PM, Hou Tao wrote: > > Hi, > > > > On 1/4/2023 3:09 PM, Yonghong Song wrote: > >> > >> > >> On 1/2/23 6:40 PM, Tonghao Zhang wrote: > >>> a > >>> > >>> On Thu, Dec 29, 2022 at 2:29 PM Yonghong Song <yhs@xxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>> On 12/28/22 2:24 PM, Alexei Starovoitov wrote: > >>>>> On Tue, Dec 27, 2022 at 8:43 PM Yonghong Song <yhs@xxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 12/18/22 8:15 PM, xiangxia.m.yue@xxxxxxxxx wrote: > >>>>>>> From: Tonghao Zhang <xiangxia.m.yue@xxxxxxxxx> > >>>>>>> > >>>>>>> This testing show how to reproduce deadlock in special case. > >>>>>>> We update htab map in Task and NMI context. Task can be interrupted by > >>>>>>> NMI, if the same map bucket was locked, there will be a deadlock. > >>>>>>> > >>>>>>> * map max_entries is 2. > >>>>>>> * NMI using key 4 and Task context using key 20. > >>>>>>> * so same bucket index but map_locked index is different. > >>>>>>> > >>>>>>> The selftest use perf to produce the NMI and fentry nmi_handle. > >>>>>>> Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update > >>>>>>> map syscall increase this counter in bpf_disable_instrumentation. > >>>>>>> Then fentry nmi_handle and update hash map will reproduce the issue. > > SNIP > >>>>>>> diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c > >>>>>>> b/tools/testing/selftests/bpf/progs/htab_deadlock.c > >>>>>>> new file mode 100644 > >>>>>>> index 000000000000..d394f95e97c3 > >>>>>>> --- /dev/null > >>>>>>> +++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c > >>>>>>> @@ -0,0 +1,32 @@ > >>>>>>> +// SPDX-License-Identifier: GPL-2.0 > >>>>>>> +/* Copyright (c) 2022 DiDi Global Inc. */ > >>>>>>> +#include <linux/bpf.h> > >>>>>>> +#include <bpf/bpf_helpers.h> > >>>>>>> +#include <bpf/bpf_tracing.h> > >>>>>>> + > >>>>>>> +char _license[] SEC("license") = "GPL"; > >>>>>>> + > >>>>>>> +struct { > >>>>>>> + __uint(type, BPF_MAP_TYPE_HASH); > >>>>>>> + __uint(max_entries, 2); > >>>>>>> + __uint(map_flags, BPF_F_ZERO_SEED); > >>>>>>> + __type(key, unsigned int); > >>>>>>> + __type(value, unsigned int); > >>>>>>> +} htab SEC(".maps"); > >>>>>>> + > >>>>>>> +/* nmi_handle on x86 platform. If changing keyword > >>>>>>> + * "static" to "inline", this prog load failed. */ > >>>>>>> +SEC("fentry/nmi_handle") > >>>>>> > >>>>>> The above comment is not what I mean. In arch/x86/kernel/nmi.c, > >>>>>> we have > >>>>>> static int nmi_handle(unsigned int type, struct pt_regs *regs) > >>>>>> { > >>>>>> ... > >>>>>> } > >>>>>> ... > >>>>>> static noinstr void default_do_nmi(struct pt_regs *regs) > >>>>>> { > >>>>>> ... > >>>>>> handled = nmi_handle(NMI_LOCAL, regs); > >>>>>> ... > >>>>>> } > >>>>>> > >>>>>> Since nmi_handle is a static function, it is possible that > >>>>>> the function might be inlined in default_do_nmi by the > >>>>>> compiler. If this happens, fentry/nmi_handle will not > >>>>>> be triggered and the test will pass. > >>>>>> > >>>>>> So I suggest to change the comment to > >>>>>> nmi_handle() is a static function and might be > >>>>>> inlined into its caller. If this happens, the > >>>>>> test can still pass without previous kernel fix. > >>>>> > >>>>> It's worse than this. > >>>>> fentry is buggy. > >>>>> We shouldn't allow attaching fentry to: > >>>>> NOKPROBE_SYMBOL(nmi_handle); > >>>> > >>>> Okay, I see. Looks we should prevent fentry from > >>>> attaching any NOKPROBE_SYMBOL functions. > >>>> > >>>> BTW, I think fentry/nmi_handle can be replaced with > >>>> tracepoint nmi/nmi_handler. it is more reliable > >>> The tracepoint will not reproduce the deadlock(we have discussed v2). > >>> If it's not easy to complete a test for this case, should we drop this > >>> testcase patch? or fentry the nmi_handle and update the comments. > >> > >> could we use a softirq perf event (timer), e.g., > >> > >> struct perf_event_attr attr = { > >> .sample_period = 1, > >> .type = PERF_TYPE_SOFTWARE, > >> .config = PERF_COUNT_SW_CPU_CLOCK, > >> }; > >> > >> then you can attach function hrtimer_run_softirq (not tested) or > >> similar functions? > > The context will be a hard-irq context, right ? Because htab_lock_bucket() has > > already disabled hard-irq on current CPU, so the dead-lock will be impossible. > > Okay, I see. soft-irq doesn't work. The only thing it works is nmi since > it is non-masking. > > >> > >> I suspect most (if not all) functions in nmi path cannot > >> be kprobe'd. > > It seems that perf_event_nmi_handler() is also nokprobe function. However I > > think we could try its callees (e.g., x86_pmu_handle_irq or perf_event_overflow). > > If we can find a function in nmi handling path which is not marked as > nonkprobe, sure, we can use that function for fentry. I think perf_event_overflow is ok. [ 93.233093] dump_stack_lvl+0x57/0x81 [ 93.233098] lock_acquire+0x1f4/0x29a [ 93.233103] ? htab_lock_bucket+0x61/0x6c [ 93.233108] _raw_spin_lock_irqsave+0x43/0x7f [ 93.233111] ? htab_lock_bucket+0x61/0x6c [ 93.233114] htab_lock_bucket+0x61/0x6c [ 93.233118] htab_map_update_elem+0x11e/0x220 [ 93.233124] bpf_prog_df326439468c24a9_bpf_prog1+0x41/0x45 [ 93.233137] bpf_trampoline_6442478975_0+0x48/0x1000 [ 93.233144] perf_event_overflow+0x5/0x15 [ 93.233149] handle_pmi_common+0x1ad/0x1f0 [ 93.233166] intel_pmu_handle_irq+0x136/0x18a [ 93.233170] perf_event_nmi_handler+0x28/0x47 [ 93.233176] nmi_handle+0xb8/0x254 [ 93.233182] default_do_nmi+0x3d/0xf6 [ 93.233187] exc_nmi+0xa1/0x109 [ 93.233191] end_repeat_nmi+0x16/0x67 > >> > >>>> and won't be impacted by potential NOKPROBE_SYMBOL > >>>> issues. > >>> > >>> > >>> > >> . > > -- Best regards, Tonghao