On Wed, Nov 13, 2024 at 1:59 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, Nov 7, 2024 at 6:56 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > Introduce range_tree (internval tree plus rbtree) to track > > unallocated ranges in bpf arena and replace maple_tree with it. > > This is a step towards making bpf_arena|free_alloc_pages non-sleepable. > > The previous approach to reuse drm_mm to replace maple_tree reached > > dead end, since sizeof(struct drm_mm_node) = 168 and > > sizeof(struct maple_node) = 256 while > > sizeof(struct range_node) = 64 introduced in this patch. > > Not only it's smaller, but the algorithm splits and merges > > adjacent ranges. Ultimate performance doesn't matter. > > The main objective of range_tree is to work in context > > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. > > > > Alexei Starovoitov (2): > > bpf: Introduce range_tree data structure and use it in bpf arena > > selftests/bpf: Add a test for arena range tree algorithm > > > > kernel/bpf/Makefile | 2 +- > > kernel/bpf/arena.c | 34 ++- > > kernel/bpf/range_tree.c | 262 ++++++++++++++++++ > > kernel/bpf/range_tree.h | 21 ++ > > .../bpf/progs/verifier_arena_large.c | 110 +++++++- > > 5 files changed, 412 insertions(+), 17 deletions(-) > > create mode 100644 kernel/bpf/range_tree.c > > create mode 100644 kernel/bpf/range_tree.h > > > > -- > > 2.43.5 > > > > I skimmed through just to familiarize myself, superficially the range > addition logic seems correct. > > I'll just bikeshed a bit, take it for what it's worth. I found some > naming choices a bit weird. > > rn_start and rn_last, just doesn't match in my head. If it's "start", > then it's "end" (or "finish", but it's weird for this case). If it's > "last", then it should have "first". "start"/"end" sounds best in my > head, fwiw. Agree. It bothered me too a bit, but I kept it as-is to be consistent with xbitmap. So prefer to keep it this way. > > As for an API, is_range_tree_set() caught my eye as well. I'd expect > to see a consistent "range_tree_" prefix for the internal API for this > data structure. So "range_tree_is_set()" was what I expected. This is what I tried first, but looking at how it can be used the "_is_" part in the middle is too easy to misread. if (!range_tree_is_set(rt, pgoff, page_cnt)) range_tree_set(rt, pgoff, page_cnt); // not so bad here if (!range_tree_is_set(rt, pgoff, page_cnt)) // is above "_set" or "_is_set" range_tree_clear(rt, pgoff, page_cnt); Hence I moved "is_" to the beginning to make it more visually different: if (!is_range_tree_set(rt, pgoff, page_cnt)) range_tree_clear(rt, pgoff, page_cnt); Not sure whether the consistent "range_tree_" prefix is a better trade off. No strong opinion.