Patch "bpf: Fail bpf_timer_cancel when callback is being cancelled" has been added to the 6.6-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 13 Jul 2024 09:30:42 -0400

This is a note to let you know that I've just added the patch titled

    bpf: Fail bpf_timer_cancel when callback is being cancelled

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     bpf-fail-bpf_timer_cancel-when-callback-is-being-can.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 507ab6d101c14097ef25b496e4b9aafc02798d09
Author: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
Date:   Tue Jul 9 18:54:38 2024 +0000

    bpf: Fail bpf_timer_cancel when callback is being cancelled
    
    [ Upstream commit d4523831f07a267a943f0dde844bf8ead7495f13 ]
    
    Given a schedule:
    
    timer1 cb                       timer2 cb
    
    bpf_timer_cancel(timer2);       bpf_timer_cancel(timer1);
    
    Both bpf_timer_cancel calls would wait for the other callback to finish
    executing, introducing a lockup.
    
    Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps
    track of all in-flight cancellation requests for a given BPF timer.
    Whenever cancelling a BPF timer, we must check if we have outstanding
    cancellation requests, and if so, we must fail the operation with an
    error (-EDEADLK) since cancellation is synchronous and waits for the
    callback to finish executing. This implies that we can enter a deadlock
    situation involving two or more timer callbacks executing in parallel
    and attempting to cancel one another.
    
    Note that we avoid incrementing the cancelling counter for the target
    timer (the one being cancelled) if bpf_timer_cancel is not invoked from
    a callback, to avoid spurious errors. The whole point of detecting
    cur->cancelling and returning -EDEADLK is to not enter a busy wait loop
    (which may or may not lead to a lockup). This does not apply in case the
    caller is in a non-callback context, the other side can continue to
    cancel as it sees fit without running into errors.
    
    Background on prior attempts:
    
    Earlier versions of this patch used a bool 'cancelling' bit and used the
    following pattern under timer->lock to publish cancellation status.
    
    lock(t->lock);
    t->cancelling = true;
    mb();
    if (cur->cancelling)
            return -EDEADLK;
    unlock(t->lock);
    hrtimer_cancel(t->timer);
    t->cancelling = false;
    
    The store outside the critical section could overwrite a parallel
    requests t->cancelling assignment to true, to ensure the parallely
    executing callback observes its cancellation status.
    
    It would be necessary to clear this cancelling bit once hrtimer_cancel
    is done, but lack of serialization introduced races. Another option was
    explored where bpf_timer_start would clear the bit when (re)starting the
    timer under timer->lock. This would ensure serialized access to the
    cancelling bit, but may allow it to be cleared before in-flight
    hrtimer_cancel has finished executing, such that lockups can occur
    again.
    
    Thus, we choose an atomic counter to keep track of all outstanding
    cancellation requests and use it to prevent lockups in case callbacks
    attempt to cancel each other while executing in parallel.
    
    Reported-by: Dohyun Kim <dohyunkim@xxxxxxxxxx>
    Reported-by: Neel Natu <neelnatu@xxxxxxxxxx>
    Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
    Link: https://lore.kernel.org/r/20240709185440.1104957-2-memxor@xxxxxxxxx
    Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index c34c93aa5a5e7..9ab6be9653059 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1106,6 +1106,7 @@ struct bpf_async_cb {
 struct bpf_hrtimer {
 	struct bpf_async_cb cb;
 	struct hrtimer timer;
+	atomic_t cancelling;
 };
 
 /* the actual struct hidden inside uapi struct bpf_timer */
@@ -1204,6 +1205,7 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
 		clockid = flags & (MAX_CLOCKS - 1);
 		t = (struct bpf_hrtimer *)cb;
 
+		atomic_set(&t->cancelling, 0);
 		hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
 		t->timer.function = bpf_timer_cb;
 		cb->value = (void *)async - map->record->timer_off;
@@ -1364,7 +1366,8 @@ static void drop_prog_refcnt(struct bpf_async_cb *async)
 
 BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
 {
-	struct bpf_hrtimer *t;
+	struct bpf_hrtimer *t, *cur_t;
+	bool inc = false;
 	int ret = 0;
 
 	if (in_nmi())
@@ -1376,14 +1379,41 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
 		ret = -EINVAL;
 		goto out;
 	}
-	if (this_cpu_read(hrtimer_running) == t) {
+
+	cur_t = this_cpu_read(hrtimer_running);
+	if (cur_t == t) {
 		/* If bpf callback_fn is trying to bpf_timer_cancel()
 		 * its own timer the hrtimer_cancel() will deadlock
-		 * since it waits for callback_fn to finish
+		 * since it waits for callback_fn to finish.
+		 */
+		ret = -EDEADLK;
+		goto out;
+	}
+
+	/* Only account in-flight cancellations when invoked from a timer
+	 * callback, since we want to avoid waiting only if other _callbacks_
+	 * are waiting on us, to avoid introducing lockups. Non-callback paths
+	 * are ok, since nobody would synchronously wait for their completion.
+	 */
+	if (!cur_t)
+		goto drop;
+	atomic_inc(&t->cancelling);
+	/* Need full barrier after relaxed atomic_inc */
+	smp_mb__after_atomic();
+	inc = true;
+	if (atomic_read(&cur_t->cancelling)) {
+		/* We're cancelling timer t, while some other timer callback is
+		 * attempting to cancel us. In such a case, it might be possible
+		 * that timer t belongs to the other callback, or some other
+		 * callback waiting upon it (creating transitive dependencies
+		 * upon us), and we will enter a deadlock if we continue
+		 * cancelling and waiting for it synchronously, since it might
+		 * do the same. Bail!
 		 */
 		ret = -EDEADLK;
 		goto out;
 	}
+drop:
 	drop_prog_refcnt(&t->cb);
 out:
 	__bpf_spin_unlock_irqrestore(&timer->lock);
@@ -1391,6 +1421,8 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
 	 * if it was running.
 	 */
 	ret = ret ?: hrtimer_cancel(&t->timer);
+	if (inc)
+		atomic_dec(&t->cancelling);
 	rcu_read_unlock();
 	return ret;
 }