On Sun, Dec 19, 2021 at 01:22:42PM +0800, Hou Tao wrote: > Hi, > > In order to use string as hash-table key, key_size must be the storage > size of longest string. If there are large differencies in string > length, the hash distribution will be sub-optimal due to the unused > zero bytes in shorter strings and the lookup will be inefficient due to > unnecessary memcpy(). > > Also it is possible the unused part of string key returned from bpf helper > (e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key, > the lookup will fail with -ENOENT (as reported in [1]). > > The patchset tries to address the inefficiency by adding support for > string key. During the key comparison, the string length is checked > first to reduce the uunecessary memcmp. Also update the hash function > from jhash() to full_name_hash() to reduce hash collision of string key. > > There are about 16% and 106% improvment in benchmark under x86-64 and > arm64 when key_size is 256. About 45% and %161 when key size is greater > than 1024. > > Also testing the performance improvment by using all files under linux > kernel sources as the string key input. There are about 74k files and the > maximum string length is 101. When key_size is 104, there are about 9% > and 35% win under x86-64 and arm64 in lookup performance, and when key_size > is 256, the win increases to 78% and 109% respectively. > > Beside the optimization of lookup for string key, it seems that the > allocated space for BPF_F_NO_PREALLOC-case can also be optimized. More > trials and tests will be conducted if the idea of string key is accepted. It will work when the key is a string. Sooner or later somebody would need the key to be a string and few other integers or pointers. This approach will not be usable. Much worse, this approach will be impossible to extend. Have you considered a more generic string support? Make null terminated string to be a fist class citizen. wdyt?