On 1/24/25 13:47, Uladzislau Rezki wrote: > On Thu, Jan 23, 2025 at 11:37:20AM +0100, Vlastimil Babka wrote: >> RCU has been special-casing callback function pointers that are integers >> lower than 4096 as offsets of rcu_head for kvfree() instead. The tree >> RCU implementation no longer does that as the batched kvfree_rcu() is >> not a simple call_rcu(). The tiny RCU still does, and the plan is also >> to make tree RCU use call_rcu() for SLUB_TINY configurations. >> >> Instead of teaching tree RCU again to special case the offsets, let's >> remove the special casing completely. Since there's no SLOB anymore, it >> is possible to create a callback function that can take a pointer to a >> middle of slab object with unknown offset and determine the object's >> pointer before freeing it, so implement that as kvfree_rcu_cb(). >> >> Large kmalloc and vmalloc allocations are handled simply by aligning >> down to page size. For that we retain the requirement that the offset is >> smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely >> and instead just opencode the condition in the BUILD_BUG_ON() check. >> >> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> >> --- >> include/linux/rcupdate.h | 24 +++++++++--------------- >> kernel/rcu/tiny.c | 13 ------------- >> mm/slab.h | 2 ++ >> mm/slab_common.c | 5 +---- >> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 54 insertions(+), 32 deletions(-) >> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h >> index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644 >> --- a/include/linux/rcupdate.h >> +++ b/include/linux/rcupdate.h >> @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) >> #define RCU_POINTER_INITIALIZER(p, v) \ >> .p = RCU_INITIALIZER(v) >> >> -/* >> - * Does the specified offset indicate that the corresponding rcu_head >> - * structure can be handled by kvfree_rcu()? >> - */ >> -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096) >> - >> /** >> * kfree_rcu() - kfree an object after a grace period. >> * @ptr: pointer to kfree for double-argument invocations. >> @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) >> * when they are used in a kernel module, that module must invoke the >> * high-latency rcu_barrier() function at module-unload time. >> * >> - * The kfree_rcu() function handles this issue. Rather than encoding a >> - * function address in the embedded rcu_head structure, kfree_rcu() instead >> - * encodes the offset of the rcu_head structure within the base structure. >> - * Because the functions are not allowed in the low-order 4096 bytes of >> - * kernel virtual memory, offsets up to 4095 bytes can be accommodated. >> + * The kfree_rcu() function handles this issue. In order to have a universal >> + * callback function handling different offsets of rcu_head, the callback needs >> + * to determine the starting address of the freed object, which can be a large >> + * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to >> + * page boundary for those, only offsets up to 4095 bytes can be accommodated. >> * If the offset is larger than 4095 bytes, a compile-time error will >> * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can >> * either fall back to use of call_rcu() or rearrange the structure to >> @@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr); >> do { \ >> typeof (ptr) ___p = (ptr); \ >> \ >> - if (___p) { \ >> - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \ >> - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \ >> - } \ >> + if (___p) { \ >> + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \ >> + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \ >> + } \ >> > Why removing the macro? At least __is_kvfree_rcu_offset() makes it > clear what and why + it has a nice comment what it is used for. 4096 > looks like hard-coded value. Hmm but it's ultimately shorter. We could add a short comment to kvfree_rcu_arg_2() referring to the longer explanation in the kfree_rcu() comment? > Or you do not want that someone else started to use that macro in > another places? And also that, since this has to be exposed in a "public" header. >> diff --git a/mm/slab.h b/mm/slab.h >> index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644 >> --- a/mm/slab.h >> +++ b/mm/slab.h >> @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, >> void **p, int objects, struct slabobj_ext *obj_exts); >> #endif >> >> +void kvfree_rcu_cb(struct rcu_head *head); >> + >> size_t __ksize(const void *objp); >> >> static inline size_t slab_ksize(const struct kmem_cache *s) >> diff --git a/mm/slab_common.c b/mm/slab_common.c >> index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644 >> --- a/mm/slab_common.c >> +++ b/mm/slab_common.c >> @@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head) >> rcu_lock_acquire(&rcu_callback_map); >> trace_rcu_invoke_kvfree_callback("slab", head, offset); >> >> - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) >> - kvfree(ptr); >> - > This is not correct unless i miss something. Why do you remove this? Oops, meant to remove just the warn check, and unconditionally call kvfree(), thanks for noticing :) The warning could really only trigger due to a use-after-free corruption of the ptr pointer, because the BUILD_BUG_ON() would catch misuse with a too large offset, so we don't need to keep it. >> +void kvfree_rcu_cb(struct rcu_head *head) >> +{ >> + void *obj = head; >> + struct folio *folio; >> + struct slab *slab; >> + struct kmem_cache *s; >> + void *slab_addr; >> + >> + if (unlikely(is_vmalloc_addr(obj))) { >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj); >> + vfree(obj); >> + return; >> + } >> + >> + folio = virt_to_folio(obj); >> + if (unlikely(!folio_test_slab(folio))) { >> + /* >> + * rcu_head offset can be only less than page size so no need to >> + * consider folio order >> + */ >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj); >> + free_large_kmalloc(folio, obj); >> + return; >> + } >> + >> + slab = folio_slab(folio); >> + s = slab->slab_cache; >> + slab_addr = folio_address(folio); >> + >> + if (is_kfence_address(obj)) { >> + obj = kfence_object_start(obj); >> + } else { >> + unsigned int idx = __obj_to_index(s, slab_addr, obj); >> + >> + obj = slab_addr + s->size * idx; >> + obj = fixup_red_left(s, obj); >> + } >> + >> + slab_free(s, slab, obj, _RET_IP_); >> +} >> + > Tiny computer case. Just small nit, maybe remove unlikely() but you decide :) Force of habbit :) > -- > Uladzislau Rezki