On 1/4/23 6:32 AM, Tonghao Zhang wrote:
On Wed, Jan 4, 2023 at 4:01 PM Yonghong Song <yhs@xxxxxxxx> wrote:
On 1/3/23 11:51 PM, Hou Tao wrote:
Hi,
On 1/4/2023 3:09 PM, Yonghong Song wrote:
On 1/2/23 6:40 PM, Tonghao Zhang wrote:
a
On Thu, Dec 29, 2022 at 2:29 PM Yonghong Song <yhs@xxxxxxxx> wrote:
On 12/28/22 2:24 PM, Alexei Starovoitov wrote:
On Tue, Dec 27, 2022 at 8:43 PM Yonghong Song <yhs@xxxxxxxx> wrote:
On 12/18/22 8:15 PM, xiangxia.m.yue@xxxxxxxxx wrote:
From: Tonghao Zhang <xiangxia.m.yue@xxxxxxxxx>
This testing show how to reproduce deadlock in special case.
We update htab map in Task and NMI context. Task can be interrupted by
NMI, if the same map bucket was locked, there will be a deadlock.
* map max_entries is 2.
* NMI using key 4 and Task context using key 20.
* so same bucket index but map_locked index is different.
The selftest use perf to produce the NMI and fentry nmi_handle.
Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update
map syscall increase this counter in bpf_disable_instrumentation.
Then fentry nmi_handle and update hash map will reproduce the issue.
SNIP
diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c
b/tools/testing/selftests/bpf/progs/htab_deadlock.c
new file mode 100644
index 000000000000..d394f95e97c3
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 DiDi Global Inc. */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 2);
+ __uint(map_flags, BPF_F_ZERO_SEED);
+ __type(key, unsigned int);
+ __type(value, unsigned int);
+} htab SEC(".maps");
+
+/* nmi_handle on x86 platform. If changing keyword
+ * "static" to "inline", this prog load failed. */
+SEC("fentry/nmi_handle")
The above comment is not what I mean. In arch/x86/kernel/nmi.c,
we have
static int nmi_handle(unsigned int type, struct pt_regs *regs)
{
...
}
...
static noinstr void default_do_nmi(struct pt_regs *regs)
{
...
handled = nmi_handle(NMI_LOCAL, regs);
...
}
Since nmi_handle is a static function, it is possible that
the function might be inlined in default_do_nmi by the
compiler. If this happens, fentry/nmi_handle will not
be triggered and the test will pass.
So I suggest to change the comment to
nmi_handle() is a static function and might be
inlined into its caller. If this happens, the
test can still pass without previous kernel fix.
It's worse than this.
fentry is buggy.
We shouldn't allow attaching fentry to:
NOKPROBE_SYMBOL(nmi_handle);
Okay, I see. Looks we should prevent fentry from
attaching any NOKPROBE_SYMBOL functions.
BTW, I think fentry/nmi_handle can be replaced with
tracepoint nmi/nmi_handler. it is more reliable
The tracepoint will not reproduce the deadlock(we have discussed v2).
If it's not easy to complete a test for this case, should we drop this
testcase patch? or fentry the nmi_handle and update the comments.
could we use a softirq perf event (timer), e.g.,
struct perf_event_attr attr = {
.sample_period = 1,
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_CPU_CLOCK,
};
then you can attach function hrtimer_run_softirq (not tested) or
similar functions?
The context will be a hard-irq context, right ? Because htab_lock_bucket() has
already disabled hard-irq on current CPU, so the dead-lock will be impossible.
Okay, I see. soft-irq doesn't work. The only thing it works is nmi since
it is non-masking.
I suspect most (if not all) functions in nmi path cannot
be kprobe'd.
It seems that perf_event_nmi_handler() is also nokprobe function. However I
think we could try its callees (e.g., x86_pmu_handle_irq or perf_event_overflow).
If we can find a function in nmi handling path which is not marked as
nonkprobe, sure, we can use that function for fentry.
I think perf_event_overflow is ok.
[ 93.233093] dump_stack_lvl+0x57/0x81
[ 93.233098] lock_acquire+0x1f4/0x29a
[ 93.233103] ? htab_lock_bucket+0x61/0x6c
[ 93.233108] _raw_spin_lock_irqsave+0x43/0x7f
[ 93.233111] ? htab_lock_bucket+0x61/0x6c
[ 93.233114] htab_lock_bucket+0x61/0x6c
[ 93.233118] htab_map_update_elem+0x11e/0x220
[ 93.233124] bpf_prog_df326439468c24a9_bpf_prog1+0x41/0x45
[ 93.233137] bpf_trampoline_6442478975_0+0x48/0x1000
[ 93.233144] perf_event_overflow+0x5/0x15
[ 93.233149] handle_pmi_common+0x1ad/0x1f0
[ 93.233166] intel_pmu_handle_irq+0x136/0x18a
[ 93.233170] perf_event_nmi_handler+0x28/0x47
[ 93.233176] nmi_handle+0xb8/0x254
[ 93.233182] default_do_nmi+0x3d/0xf6
[ 93.233187] exc_nmi+0xa1/0x109
[ 93.233191] end_repeat_nmi+0x16/0x67
sounds good. I agree fentry/perf_event_overflow can be used for the test.
and won't be impacted by potential NOKPROBE_SYMBOL
issues.
.