On Tue, Sep 12, 2023 at 7:19 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote: > > On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote: > > > Currently, we are only using the linear search method to find the type id > > > by the name, which has a time complexity of O(n). This change involves > > > sorting the names of btf types in ascending order and using binary search, > > > which has a time complexity of O(log(n)). This idea was inspired by the > > > following patch: > > > > > > 60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_name()"). > > > > > > At present, this improvement is only for searching in vmlinux's and > > > module's BTFs, and the kind should only be BTF_KIND_FUNC or BTF_KIND_STRUCT. > > > > > > Another change is the search direction, where we search the BTF first and > > > then its base, the type id of the first matched btf_type will be returned. > > > > > > Here is a time-consuming result that finding all the type ids of 67,819 kernel > > > functions in vmlinux's BTF by their names: > > > > > > Before: 17000 ms > > > After: 10 ms > > > > > > The average lookup performance has improved about 1700x at the above scenario. > > > > > > However, this change will consume more memory, for example, 67,819 kernel > > > functions will allocate about 530KB memory. > > > > Hi Donglin, > > > > I think this is a good improvement. However, I wonder, why did you > > choose to have a separate name map for each BTF kind? > > > > I did some analysis for my local testing kernel config and got such numbers: > > - total number of BTF objects: 97350 > > - number of FUNC and STRUCT objects: 51597 > > - number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATASEC objects: 56817 > > (these are all kinds for which lookup by name might make sense) > > - number of named objects: 54246 > > - number of name collisions: > > - unique names: 53985 counts > > - 2 objects with the same name: 129 counts > > - 3 objects with the same name: 3 counts > > > > So, it appears that having a single map for all named objects makes > > sense and would also simplify the implementation, what do you think? > > Some more numbers for my config: > - 13241 types (struct, union, typedef, enum), log2 13241 = 13.7 > - 43575 funcs, log2 43575 = 15.4 > Thus, having separate map for types vs functions might save ~1.7 > search iterations. Is this a significant slowdown in practice? What do you propose to do in case of duplicates ? func and struct can have the same name, but they will have two different btf_ids. How do we store them ? Also we might add global vars to BTF. Such request came up several times. So we need to make sure our search approach scales to func, struct, vars. I don't recall whether we search any other kinds. Separate arrays for different kinds seems ok. It's a bit of code complexity, but it's not an increase in memory. With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyte extra memory. That's quite a bit. Anything we can do to compress it? Folks requested vmlinux BTF to be a module, so it's loaded on demand. BTF memory consumption is a concern to many. I think before we add these per-kind search arrays we better make BTF optional as a module.