Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers.

Yonghong Song <yhs@xxxxxx> · Fri, 25 Jun 2021 08:54:55 -0700

On 6/25/21 7:57 AM, Alexei Starovoitov wrote:
On 6/24/21 11:25 PM, Yonghong Song wrote:

+
+    ____bpf_spin_lock(&timer->lock);

I think we may still have some issues.
Case 1:
   1. one bpf program is running in process context,
      bpf_timer_start() is called and timer->lock is taken
   2. timer softirq is triggered and this callback is called

___bpf_spin_lock is actually irqsave version of spin_lock.
So this race is not possible.

Sorry I missed that ____bpf_spin_lock() has local_irq_save(),
so yes. the above situation cannot happen.


Case 2:
   1. this callback is called, timer->lock is taken
   2. a nmi happens and some bpf program is called (kprobe, tracepoint,
      fentry/fexit or perf_event, etc.) and that program calls
      bpf_timer_start()

So we could have deadlock in both above cases?

Shouldn't be possible either because bpf timers are not allowed
in nmi-bpf-progs. I'll double check that it's the case.
Pretty much the same restrictions are with bpf_spin_lock.

The patch added bpf_base_func_proto() to bpf_tracing_func_proto:

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7a52bc172841..80f6e6dafd5e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1057,7 +1057,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, 
const struct bpf_prog *prog)
 	case BPF_FUNC_snprintf:
 		return &bpf_snprintf_proto;
 	default:
-		return NULL;
+		return bpf_base_func_proto(func_id);
 	}
 }

and timer helpers are added to bpf_base_func_proto:
@@ -1055,6 +1330,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_per_cpu_ptr_proto;
 	case BPF_FUNC_this_cpu_ptr:
 		return &bpf_this_cpu_ptr_proto;
+	case BPF_FUNC_timer_init:
+		return &bpf_timer_init_proto;
+	case BPF_FUNC_timer_start:
+		return &bpf_timer_start_proto;
+	case BPF_FUNC_timer_cancel:
+		return &bpf_timer_cancel_proto;
 	default:
 		break;
 	}

static const struct bpf_func_proto *
pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
        switch (func_id) {
...
        default:
                return bpf_tracing_func_proto(func_id, prog);
        }
}

static const struct bpf_func_proto *
kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog 
*prog)
{
...
        default:
                return bpf_tracing_func_proto(func_id, prog);
        }
}

Also, we have some functions inside ____bpf_spin_lock() e.g., 
bpf_prog_inc(), hrtimer_start(), etc. If we want to be absolutely safe,
we need to mark them not tracable for kprobe/kretprobe/fentry/fexit/...
But I am not sure whether this is really needed or not.



+    /* callback_fn and prog need to match. They're updated together
+     * and have to be read under lock.
+     */
+    prog = t->prog;
+    callback_fn = t->callback_fn;
+
+    /* wrap bpf subprog invocation with prog->refcnt++ and -- to make
+     * sure that refcnt doesn't become zero when subprog is executing.
+     * Do it under lock to make sure that bpf_timer_start doesn't drop
+     * prev prog refcnt to zero before timer_cb has a chance to bump 
it.
+     */
+    bpf_prog_inc(prog);
+    ____bpf_spin_unlock(&timer->lock);
+
+    /* bpf_timer_cb() runs in hrtimer_run_softirq. It doesn't 
migrate and
+     * cannot be preempted by another bpf_timer_cb() on the same cpu.
+     * Remember the timer this callback is servicing to prevent
+     * deadlock if callback_fn() calls bpf_timer_cancel() on the 
same timer.
+     */
+    this_cpu_write(hrtimer_running, t);

This is not protected by spinlock, in bpf_timer_cancel() and
bpf_timer_cancel_and_free(), we have spinlock protected read, so
there is potential race conditions if callback function and 
helper/bpf_timer_cancel_and_free run in different context?

what kind of race do you see?
This is per-cpu var and bpf_timer_cb is in softirq
while timer_cancel/cancel_and_free are calling it under
spin_lock_irqsave... so they cannot race because softirq
and bpf_timer_cb will run after start/canel/cancel_free
will do unlock_irqrestore.

Again, I missed local_irq_save(). With irqsave, this indeed
won't happen. The same for a few comments below.


+    prev = t->prog;
+    if (prev != prog) {
+        if (prev)
+            /* Drop pref prog refcnt when swapping with new prog */

pref -> prev

+            bpf_prog_put(prev);

Maybe we want to put the above two lines with {}?

you mean add {} because there is a comment ?
I don't think the kernel coding style considers comment as a statement.

+    if (this_cpu_read(hrtimer_running) != t)
+        hrtimer_cancel(&t->timer);

We could still have race conditions here when 
bpf_timer_cancel_and_free() runs in process context and callback in
softirq context. I guess we might be okay.

No, since this check is under spin_lock_irsave.

But if bpf_timer_cancel_and_free() in nmi context, not 100% sure
whether we have issues or not.

timers shouldn't be available to nmi-bpf progs.
There will be all sorts of issues.
The underlying hrtimer implementation cannot deal with nmi either.