On Tue 2022-09-20 15:13:12, Zhen Lei wrote: > Currently, to search for a symbol, we need to expand the symbols in > 'kallsyms_names' one by one, and then use the expanded string for > comparison. Because we do not know the symbol type, and the symbol type > may be combined with the following characters to form a token. > > So if we don't compress the symbol type, we can first compress the > searched symbol and then make a quick comparison based on the compressed > length and content. In this way, for entries with mismatched lengths, > there is no need to expand and compare strings. And for those matching > lengths, there's no need to expand the symbol. This saves a lot of time. > According to my test results, the average performance of > kallsyms_lookup_name() can be improved by 20 to 30 times. > > Of course, because the symbol type is forcibly not compressed, the > compression rate also decreases. Here are the test results with > defconfig: > > arm64: <<<<<< > --------------------------------------------------------------- > | ALL | nr_symbols | compressed size | original size | ratio(%) | > -----|---------------------------------------------------------| > Before | Y | 174094 | 1884938 | 3750653 | 50.25 | > After | Y | 174099 | 1960154 | 3750756 | 52.26 | > Before | N | 61744 | 725507 | 1222737 | 59.33 | > After | N | 61747 | 745733 | 1222801 | 60.98 | > --------------------------------------------------------------- > The memory overhead is increased by: > 73.5KiB and 4.0% if CONFIG_KALLSYMS_ALL=y. > 19.8KiB and 2.8% if CONFIG_KALLSYMS_ALL=n. > > x86: <<<<<<<< > --------------------------------------------------------------- > | ALL | nr_symbols | compressed size | original size | ratio(%) | > -----|---------------------------------------------------------| > Before | Y | 131415 | 1697542 | 3161216 | 53.69 | > After | Y | 131540 | 1747769 | 3163933 | 55.24 | > Before | N | 60695 | 737627 | 1283046 | 57.49 | > After | N | 60699 | 754797 | 1283149 | 58.82 | > --------------------------------------------------------------- > The memory overhead is increased by: > 49.0KiB and 3.0% if CONFIG_KALLSYMS_ALL=y. > 16.8KiB and 2.3% if CONFIG_KALLSYMS_ALL=n. > > This additional memory overhead is worth it compared to the performance > improvement, I think. I agree. The speedup mentioned in the followup patches looks big. I just suggest to do this change a cleaner way, see below. > Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx> > --- > scripts/kallsyms.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c > index 3319d9f38d7a5f2..1ae9ce773d2a31d 100644 > --- a/scripts/kallsyms.c > +++ b/scripts/kallsyms.c > @@ -61,6 +61,15 @@ static int all_symbols; > static int absolute_percpu; > static int base_relative; > > +/* > + * Each entry in the symbol table consists of the symbol type and the symbol > + * itself. To optimize the performance of finding or traversing symbols in > + * kernel, do not compress the symbol type. In this way, when looking for a > + * symbol of unknown type, we can first compress the searched symbol and then > + * make a quick comparison based on the compressed length and content. > + */ > +static int sym_start_idx = 1; > + > static int token_profit[0x10000]; > > /* the table that holds the result of the compression */ > @@ -511,7 +520,7 @@ static void learn_symbol(const unsigned char *symbol, int len) > { > int i; > > - for (i = 0; i < len - 1; i++) > + for (i = sym_start_idx; i < len - 1; i++) It creates yet another twists in scripts/kallsyms.c. read_symbol() explicitely adds the type as the first character so that it can be compressed. And this patch adds a hack to skip it. Let's do it a clean way and store the type serarately: struct sym_entry { unsigned long long addr; unsigned int len; unsigned int start_pos; unsigned int percpu_absolute; unsigned char type; unsigned char name[]; }; static struct sym_entry *read_symbol(FILE *in) { [...] name_len = strlen(name); sym = malloc(sizeof(*sym) + name_len); if (!sym) { fprintf(stderr, "kallsyms failure: " "unable to allocate required amount of memory\n"); exit(EXIT_FAILURE); } sym->addr = addr; sym->len = name_len; sym->type = type; strcpy(sys->name, name); sym->percpu_absolute = 0; } It would allow to remove the tricky: static char *sym_name(const struct sym_entry *s) { return (char *)s->sym + 1; } and access s->name directly. OK, the problem is how to store the type. The clean way would be to put it into a separate section, for example: static void write_src(void) { [...] output_label("kallsyms_types"); off = 0; for (i = 0; i < table_cnt; i++) { printf("\t.byte 0x%02x\n", table[i]->type); } printf("\n"); [...] } It would probably increase the size even more. Another problem is that it would need changes in the crash dump tools, see: static int __init crash_save_vmcoreinfo_init(void) { [...] VMCOREINFO_SYMBOL(kallsyms_names); [...] } A solution would be to store it the old way: static void write_src(void) { [...] output_label("kallsyms_names"); off = 0; for (i = 0; i < table_cnt; i++) { if ((i & 0xFF) == 0) markers[i >> 8] = off; /* * Store the symbol type togerher with symbol name. * It helps to reduce the size. */ printf("\t.byte 0x%02x", table[i]->len + 1); printf(", 0x%02x", table[i]->type); for (k = 0; k < table[i]->len; k++) printf(", 0x%02x", table[i]->sym[k]); printf("\n"); /* symbol name lenght + type + "\n" */ off += table[i]->len + 2; } printf("\n"); [...] } The result would be the same as with your patch. But the code would be even cleaner than before. Best Regards, Petr