On 2022/10/27 11:26, Leizhen (ThunderTown) wrote: > > > On 2022/10/27 3:03, Luis Chamberlain wrote: >> On Wed, Oct 26, 2022 at 02:44:36PM +0800, Leizhen (ThunderTown) wrote: >>> On 2022/10/26 1:53, Luis Chamberlain wrote: >>>> This answers how we don't use a hash table, the question was *should* we >>>> use one? >>> >>> I'm not the original author, and I can only answer now based on my understanding. Maybe >>> the original author didn't think of the hash method, or he has weighed it out. >>> >>> Hash is a good solution if only performance is required and memory overhead is not >>> considered. Using hash will increase the memory size by up to "4 * kallsyms_num_syms + >>> 4 * ARRAY_SIZE(hashtable)" bytes, kallsyms_num_syms is about 1-2 million. Sorry, 1-2 million ==> 0.1~0.2 million >>> >>> Because I don't know what hash algorithm will be used, the cost of generating the >>> hash value corresponding to the symbol name is unknown now. But I think it's gonna >>> be small. But it definitely needs a simpler algorithm, the tool needs to implement >>> the same hash algorithm. >> >> For instance, you can look at evaluating if alloc_large_system_hash() would help. > > OK, I found the right hash function. In this way, the tool does not need to consider > the byte order. https://en.wikipedia.org/wiki/Jenkins_hash_function Let's go with jenkins_one_at_a_time_hash(), which looks simpler and doesn't even have to think about sizeof(long). It seems to be closest to our current needs. uint32_t jenkins_one_at_a_time_hash(const uint8_t* key, size_t length) { size_t i = 0; uint32_t hash = 0; while (i != length) { hash += key[i++]; hash += hash << 10; hash ^= hash >> 6; } hash += hash << 3; hash ^= hash >> 11; hash += hash << 15; return hash; } > > include/linux/stringhash.h > > /* > * Version 1: one byte at a time. Example of use: > * > * unsigned long hash = init_name_hash; > * while (*p) > * hash = partial_name_hash(tolower(*p++), hash); > * hash = end_name_hash(hash); > > >> >> Luis >> . >> > -- Regards, Zhen Lei