Re: [PATCH v4 4/8] kallsyms: Improve the performance of kallsyms_lookup_name()

"Leizhen (ThunderTown)" <thunder.leizhen@xxxxxxxxxx> · Wed, 28 Sep 2022 09:35:32 +0800

On 2022/9/22 15:02, Petr Mladek wrote:
> On Thu 2022-09-22 10:15:22, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2022/9/21 23:25, Petr Mladek wrote:
>>> On Tue 2022-09-20 15:13:13, Zhen Lei wrote:
>>>> Currently, to search for a symbol, we need to expand the symbols in
>>>> 'kallsyms_names' one by one, and then use the expanded string for
>>>> comparison. This process can be optimized.
>>>>
>>>> And now scripts/kallsyms no longer compresses the symbol types, each
>>>> symbol type always occupies one byte. So we can first compress the
>>>> searched symbol and then make a quick comparison based on the compressed
>>>> length and content. In this way, for entries with mismatched lengths,
>>>> there is no need to expand and compare strings. And for those matching
>>>> lengths, there's no need to expand the symbol. This saves a lot of time.
>>>> According to my test results, the average performance of
>>>> kallsyms_lookup_name() can be improved by 20 to 30 times.
>>>>
>>>> The pseudo code of the test case is as follows:
>>>> static int stat_find_name(...)
>>>> {
>>>> 	start = sched_clock();
>>>> 	(void)kallsyms_lookup_name(name);
>>>> 	end = sched_clock();
>>>> 	//Update min, max, cnt, sum
>>>> }
>>>>
>>>> /*
>>>>  * Traverse all symbols in sequence and collect statistics on the time
>>>>  * taken by kallsyms_lookup_name() to lookup each symbol.
>>>>  */
>>>> kallsyms_on_each_symbol(stat_find_name, NULL);
>>>>
>>>> The test results are as follows (twice):
>>>> After : min=5250, max=  726560, avg= 302132
>>>> After : min=5320, max=  726850, avg= 301978
>>>> Before: min=170,  max=15949190, avg=7553906
>>>> Before: min=160,  max=15877280, avg=7517784
>>>>
>>>> The average time consumed is only 4.01% and the maximum time consumed is
>>>> only 4.57% of the time consumed before optimization.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
>>>> ---
>>>>  kernel/kallsyms.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 76 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
>>>> index 3e7e2c2ad2f75ef..2d76196cfe89f34 100644
>>>> --- a/kernel/kallsyms.c
>>>> +++ b/kernel/kallsyms.c
>>>> @@ -87,6 +87,71 @@ static unsigned int kallsyms_expand_symbol(unsigned int off,
>>>> +{
>>>> +	int i, j, k, n;
>>>> +	int len, token_len;
>>>> +	const char *token;
>>>> +	unsigned char token_idx[KSYM_NAME_LEN];
>>>> +	unsigned char token_bak[KSYM_NAME_LEN];
>>>
>>> Why do we need two buffers? It should be possible to compress the name
>>> in the same buffer as it is done in compress_symbols() in scripts/callsyms.c.
>>
>> Because the performance would be a little better. Now this function takes
>> just over a microsecond. Currently, it takes about 250 microseconds on
>> average to lookup a symbol, so adding a little more time to this function
>> doesn't affect the overall picture. I'll modify and test it as you suggest
>> below.
> 
> We need to be careful about a stack overflow. I have seen that
> KSYM_NAME_LEN might need to be increased to 512 because of
> Rust support, see
> https://lore.kernel.org/r/20220805154231.31257-6-ojeda@xxxxxxxxxx
> 
>>>> @@ -192,20 +257,28 @@ unsigned long kallsyms_lookup_name(const char *name)
>>>>  	for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
>>>>  		off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
>>>>  
>>>> -		if (strcmp(namebuf, name) == 0)
>>>> -			return kallsyms_sym_address(i);
>>>> -
>>>>  		if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
>>>>  			return kallsyms_sym_address(i);
>>>
>>> Hmm, it means that the speedup is not usable when kernel is compiled LLVM?
>>> It might actually slow down the search because we would need to use
>>> both fast and slow search?
>>
>> Theoretically, I don't think so. A string comparison was removed from the
>> slow search. "if (name_len != len)" is faster than
>> "if (strcmp(namebuf, name) == 0)". Even if they're equal,
>> kallsyms_compress_symbol_name() only takes 1-2us, it doesn't affect the
>> overall picture. The average lookup time before optimization is
>> millisecond-level.
>>
>> Before: min=170,  max=15949190, avg=7553906
> 
> Good point! I agree that the potential extra overhead is negligible
> when using the old code as a fallback.

These days sleep better. When I got up this morning, my subconscious told me that
compiled LLVM could also be optimized. In fact, the method is simple, that is,
check whether the next token starts with '.' or '$' after being expanded.

I will post v7 before the holidays.

> 
> Best Regards,
> Petr
> .
> 

-- 
Regards,
  Zhen Lei