On Mon, Feb 12, 2024 at 12:19 PM Ian Rogers <irogers@xxxxxxxxxx> wrote: > > On Mon, Feb 12, 2024 at 12:15 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote: > > > > On Fri, Feb 9, 2024 at 7:18 PM Ian Rogers <irogers@xxxxxxxxxx> wrote: > > > > > > Maps is a collection of maps primarily sorted by the starting address > > > of the map. Prior to this change the maps were held in an rbtree > > > requiring 4 pointers per node. Prior to reference count checking, the > > > rbnode was embedded in the map so 3 pointers per node were > > > necessary. This change switches the rbtree to an array lazily sorted > > > by address, much as the array sorting nodes by name. 1 pointer is > > > needed per node, but to avoid excessive resizing the backing array may > > > be twice the number of used elements. Meaning the memory overhead is > > > roughly half that of the rbtree. For a perf record with > > > "--no-bpf-event -g -a" of true, the memory overhead of perf inject is > > > reduce fom 3.3MB to 3MB, so 10% or 300KB is saved. > > > > > > Map inserts always happen at the end of the array. The code tracks > > > whether the insertion violates the sorting property. O(log n) rb-tree > > > complexity is switched to O(1). > > > > > > Remove slides the array, so O(log n) rb-tree complexity is degraded to > > > O(n). > > > > > > A find may need to sort the array using qsort which is O(n*log n), but > > > in general the maps should be sorted and so average performance should > > > be O(log n) as with the rbtree. > > > > > > An rbtree node consumes a cache line, but with the array 4 nodes fit > > > on a cache line. Iteration is simplified to scanning an array rather > > > than pointer chasing. > > > > > > Overall it is expected the performance after the change should be > > > comparable to before, but with half of the memory consumed. > > > > > > To avoid a list and repeated logic around splitting maps, > > > maps__merge_in is rewritten in terms of > > > maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping > > > inserting remaining gaps. maps__fixup_overlap_and_insert splits the > > > existing mappings, then adds the incoming mapping. By adding the new > > > mapping first, then re-inserting the existing mappings the splitting > > > behavior matches. > > > > > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx> > > > Acked-by: Namhyung Kim <namhyung@xxxxxxxxxx> > > > --- > > [SNIP] > > > int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data) > > > { > > > - struct map_rb_node *pos; > > > + bool done = false; > > > int ret = 0; > > > > > > - down_read(maps__lock(maps)); > > > - maps__for_each_entry(maps, pos) { > > > - ret = cb(pos->map, data); > > > - if (ret) > > > - break; > > > + /* See locking/sorting note. */ > > > + while (!done) { > > > + down_read(maps__lock(maps)); > > > + if (maps__maps_by_address_sorted(maps)) { > > > + /* > > > + * maps__for_each_map callbacks may buggily/unsafely > > > + * insert into maps_by_address. Deliberately reload > > > + * maps__nr_maps and maps_by_address on each iteration > > > + * to avoid using memory freed by maps__insert growing > > > + * the array - this may cause maps to be skipped or > > > + * repeated. > > > + */ > > > + for (unsigned int i = 0; i < maps__nr_maps(maps); i++) { > > > + struct map **maps_by_address = maps__maps_by_address(maps); > > > > Any chance they can move out of the loop? I guess not as they are > > not marked to const/pure functions.. > > It's not because the cb(...) call below will potentially modify > maps_by_address by inserting maps and reallocating the array. Having > it outside the loop was what caused the original bug. Oh, I meant if compiler can move them automatically. Thanks, Namhyung > > > > > > > + struct map *map = maps_by_address[i]; > > > + > > > + ret = cb(map, data); > > > + if (ret) > > > + break; > > > + } > > > + done = true; > > > + } > > > + up_read(maps__lock(maps)); > > > + if (!done) > > > + maps__sort_by_address(maps); > > > } > > > - up_read(maps__lock(maps)); > > > return ret; > > > }