Patch "bpf: Defer the free of inner map when necessary" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    bpf: Defer the free of inner map when necessary

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     bpf-defer-the-free-of-inner-map-when-necessary.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 239934dc5a990777826bcb710474fc3e0de81815
Author: Hou Tao <houtao1@xxxxxxxxxx>
Date:   Mon Mar 11 14:30:22 2024 -0700

    bpf: Defer the free of inner map when necessary
    
    [ Upstream commit 876673364161da50eed6b472d746ef88242b2368 ]
    
    When updating or deleting an inner map in map array or map htab, the map
    may still be accessed by non-sleepable program or sleepable program.
    However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map
    directly through bpf_map_put(), if the ref-counter is the last one
    (which is true for most cases), the inner map will be freed by
    ops->map_free() in a kworker. But for now, most .map_free() callbacks
    don't use synchronize_rcu() or its variants to wait for the elapse of a
    RCU grace period, so after the invocation of ops->map_free completes,
    the bpf program which is accessing the inner map may incur
    use-after-free problem.
    
    Fix the free of inner map by invoking bpf_map_free_deferred() after both
    one RCU grace period and one tasks trace RCU grace period if the inner
    map has been removed from the outer map before. The deferment is
    accomplished by using call_rcu() or call_rcu_tasks_trace() when
    releasing the last ref-counter of bpf map. The newly-added rcu_head
    field in bpf_map shares the same storage space with work field to
    reduce the size of bpf_map.
    
    Fixes: bba1dc0b55ac ("bpf: Remove redundant synchronize_rcu.")
    Fixes: 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
    Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@xxxxxxxxxxxxxxx
    Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
    (cherry picked from commit 62fca83303d608ad4fec3f7428c8685680bb01b0)
    Signed-off-by: Robert Kolchmeyer <rkolchmeyer@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 97d94bcba1314..df15d4d445ddc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -192,9 +192,14 @@ struct bpf_map {
 	 */
 	atomic64_t refcnt ____cacheline_aligned;
 	atomic64_t usercnt;
-	struct work_struct work;
+	/* rcu is used before freeing and work is only used during freeing */
+	union {
+		struct work_struct work;
+		struct rcu_head rcu;
+	};
 	struct mutex freeze_mutex;
 	atomic64_t writecnt;
+	bool free_after_mult_rcu_gp;
 };
 
 static inline bool map_value_has_spin_lock(const struct bpf_map *map)
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index af0f15db1bf9a..4cf79f86bf458 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -110,10 +110,15 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
 
 void bpf_map_fd_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
 {
-	/* ptr->ops->map_free() has to go through one
-	 * rcu grace period by itself.
+	struct bpf_map *inner_map = ptr;
+
+	/* The inner map may still be used by both non-sleepable and sleepable
+	 * bpf program, so free it after one RCU grace period and one tasks
+	 * trace RCU grace period.
 	 */
-	bpf_map_put(ptr);
+	if (need_defer)
+		WRITE_ONCE(inner_map->free_after_mult_rcu_gp, true);
+	bpf_map_put(inner_map);
 }
 
 u32 bpf_map_fd_sys_lookup_elem(void *ptr)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64206856a05c4..d4b4a47081b51 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -487,6 +487,25 @@ static void bpf_map_put_uref(struct bpf_map *map)
 	}
 }
 
+static void bpf_map_free_in_work(struct bpf_map *map)
+{
+	INIT_WORK(&map->work, bpf_map_free_deferred);
+	schedule_work(&map->work);
+}
+
+static void bpf_map_free_rcu_gp(struct rcu_head *rcu)
+{
+	bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu));
+}
+
+static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu)
+{
+	if (rcu_trace_implies_rcu_gp())
+		bpf_map_free_rcu_gp(rcu);
+	else
+		call_rcu(rcu, bpf_map_free_rcu_gp);
+}
+
 /* decrement map refcnt and schedule it for freeing via workqueue
  * (unrelying map implementation ops->map_free() might sleep)
  */
@@ -496,8 +515,11 @@ static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
 		/* bpf_map_free_id() must be called first */
 		bpf_map_free_id(map, do_idr_lock);
 		btf_put(map->btf);
-		INIT_WORK(&map->work, bpf_map_free_deferred);
-		schedule_work(&map->work);
+
+		if (READ_ONCE(map->free_after_mult_rcu_gp))
+			call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp);
+		else
+			bpf_map_free_in_work(map);
 	}
 }
 




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux