On Wed 2022-09-21 10:42:56, Leizhen (ThunderTown) wrote: > > > On 2022/9/21 1:26, Petr Mladek wrote: > > On Fri 2022-09-09 21:00:09, Zhen Lei wrote: > >> Currently, to search for a symbol, we need to expand the symbols in > >> 'kallsyms_names' one by one, and then use the expanded string for > >> comparison. This is very slow. > >> > >> In fact, we can first compress the name being looked up and then use > >> it for comparison when traversing 'kallsyms_names'. > > > > This does not explain how this patch modifies the compressed data > > and why it is needed. > > Yes, I have updated the description from the v3 version. Ah, there is even v4. I have missed that. The commit message looks much better there. > So if we don't compress the symbol type, we can first compress the > searched symbol and then make a quick comparison based on the compressed > length and content. In this way, for entries with mismatched lengths, > there is no need to expand and compare strings. And for those matching > lengths, there's no need to expand the symbol. This saves a lot of time. > > > > > > >> This increases the size of 'kallsyms_names'. About 48KiB, 2.67%, on x86 > >> with defconfig. > >> Before: kallsyms_num_syms=131392, sizeof(kallsyms_names)=1823659 > >> After : kallsyms_num_syms=131392, sizeof(kallsyms_names)=1872418 > >> > >> However, if CONFIG_KALLSYMS_ALL is not set, the size of 'kallsyms_names' > >> does not change. > >> > >> Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx> > >> --- > >> scripts/kallsyms.c | 15 ++++++++++++--- > >> 1 file changed, 12 insertions(+), 3 deletions(-) > >> > >> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c > >> index f18e6dfc68c5839..ab6fe7cd014efd1 100644 > >> --- a/scripts/kallsyms.c > >> +++ b/scripts/kallsyms.c > >> + */ > >> + if (all_symbols) > >> + sym_start_idx = 1; > > > > This looks a bit fragile. My understanding is that the new code in > > kernel/kallsyms.c and kernel/module/kallsyms.c depends on this change. > > They do not depend on this change, because the index in > insert_real_symbols_in_table() is still starting from 0. kallsyms_expand_symbol() > shows that it uses every byte of the compressed data to look up the token table. > The index in insert_real_symbols_in_table() starting from 0 make sure that the > raw character of 'type' occupies a separate position in kallsyms_token_table[]. > So that kallsyms_expand_symbol() can still work well. I guess that we are talking about different things. Anyway, please ignore my concern about that it is fragile. The change in scripts/kallsyms.c does not longer depend on --all-symbols parameter in the last v4 patchset. > > I would personally suggest to store the symbol type into a separate > > sym->type entry in struct sym_entry and never compress it. > > Yes,I've also considered this, for the purpose of increasing the > compression ratio. See below, if the sorting is performed based on > the address and then based on the type. We can record all the symbol > type information in less than 100 bytes. Of course, this makes the > functions that look up symbols based on the address loop serveral > times more. However, I would like to wait until the current patch > series is accepted. Otherwise, I'll have to rework a lot of patches > and it's too much work. To be honest, I've been coding for it these days. > > cat /proc/kallsyms | awk '{print $2}' | sort | uniq -c | sort -r > 44678 r > 38299 t > 28315 T > 11644 d > 3768 D > 2778 b > 778 R > 641 B > 282 A > 178 W > 37 V This is another optimization. I agree that we could do it later after this patchset is accepted. Best Regards, Petr