On Sat, Nov 06, 2021 at 09:28:22PM +0800, Hou Tao wrote: > The benchmark runs a loop 5000 times. In the loop it reads the file name > from kprobe argument into stack by using bpf_probe_read_kernel_str(), > and compares the file name with a target character or string. > > Three cases are compared: only compare one character, compare the whole > string by a home-made strncmp() and compare the whole string by > bpf_strcmp(). > > The following is the result: > > x86-64 host: > > one character: 2613499 ns > whole str by strncmp: 2920348 ns > whole str by helper: 2779332 ns > > arm64 host: > > one character: 3898867 ns > whole str by strncmp: 4396787 ns > whole str by helper: 3968113 ns > > Compared with home-made strncmp, the performance of bpf_strncmp helper > improves 80% under x86-64 and 600% under arm64. The big performance win > on arm64 may comes from its arch-optimized strncmp(). 80% and 600% improvement?! I don't understand how this math works. Why one char is barely different in total nsec than the whole string? The string shouldn't miscompare on the first char as far as I understand the test.