Hi, On 10/12/2024 12:47 AM, Alexei Starovoitov wrote: > On Tue, Oct 8, 2024 at 2:02 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> From: Hou Tao <houtao1@xxxxxxxxxx> >> >> The patch supports lookup, update, delete and lookup_delete operations >> for hash map with dynptr map. There are two major differences between >> the implementation of normal hash map and dynptr-keyed hash map: >> >> 1) dynptr-keyed hash map doesn't support pre-allocation. >> The reason is that the dynptr in map key is allocated dynamically >> through bpf mem allocator. The length limitation for these dynptrs is >> 4088 bytes now. Because there dynptrs are allocated dynamically, the >> consumption of memory will be smaller compared with normal hash map when >> there are big differences between the length of these dynptrs. >> >> 2) the freed element in dynptr-key map will not be reused immediately >> For normal hash map, the freed element may be reused immediately by the >> newly-added element, so the lookup may return an incorrect result due to >> element deletion and element reuse. However dynptr-key map could not do >> that, there are pointers (dynptrs) in the map key and the updates of >> these dynptrs are not atomic: both the address and the length of the >> dynptr will be updated. If the element is reused immediately, the access >> of the dynptr in the freed element may incur invalid memory access due >> to the mismatch between the address and the size of dynptr, so reuse the >> freed element after one RCU grace period. >> >> Beside the differences above, dynptr-keyed hash map also needs to handle >> the maybe-nullified dynptr in the map key. >> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> >> --- >> kernel/bpf/hashtab.c | 283 +++++++++++++++++++++++++++++++++++++++---- >> 1 file changed, 257 insertions(+), 26 deletions(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index b14b87463ee0..edf19d36a413 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -88,6 +88,7 @@ struct bpf_htab { >> struct bpf_map map; >> struct bpf_mem_alloc ma; >> struct bpf_mem_alloc pcpu_ma; >> + SNIP >> + >> +static inline bool htab_is_same_key(const void *key, const void *tgt, unsigned int key_size, >> + const struct btf_record *rec) >> +{ >> + if (!rec) >> + return !memcmp(key, tgt, key_size); >> + return is_same_dynptr_key(key, tgt, key_size, rec); >> +} >> + >> /* this lookup function can only be called with bucket lock taken */ >> static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash, >> - void *key, u32 key_size) >> + void *key, u32 key_size, >> + const struct btf_record *record) >> { >> struct hlist_nulls_node *n; >> struct htab_elem *l; >> >> hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) >> - if (l->hash == hash && !memcmp(&l->key, key, key_size)) >> + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record)) >> return l; >> >> return NULL; >> @@ -657,14 +769,15 @@ static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head, u32 hash >> */ >> static struct htab_elem *lookup_nulls_elem_raw(struct hlist_nulls_head *head, >> u32 hash, void *key, >> - u32 key_size, u32 n_buckets) >> + u32 key_size, u32 n_buckets, >> + const struct btf_record *record) >> { >> struct hlist_nulls_node *n; >> struct htab_elem *l; >> >> again: >> hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) >> - if (l->hash == hash && !memcmp(&l->key, key, key_size)) >> + if (l->hash == hash && htab_is_same_key(l->key, key, key_size, record)) >> return l; > If I'm reading this correctly the support for dynptr in map keys > adds two map->key_record != NULL checks in the fast path, > hence what you said in cover letter: > >> It seems adding dynptr key support in hash map degrades the lookup >> performance about 12% and degrades the update performance about 7%. Will >> investigate these degradation first. >> >> a) lookup >> max_entries = 8K >> >> before: >> 0:hash_lookup 72347325 lookups per sec >> >> after: >> 0:hash_lookup 64758890 lookups per sec > is surprising. > > Two conditional branches contribute to 12% performance loss? > Something fishy. > Try unlikely() to hopefully recover most of it. > After analyzing 'perf report/annotate', of course. Using unlikely/likely doesn't help much. It seems the big performance gap is due to the inline of lookup_nulls_elem_raw() in __htab_map_lookup_elem(). Still don't know the reason why lookup_nulls_elem_raw() is not inlined after the change. After marking the lookup_nulls_elem_raw() function as inline, the performance gap is within ~2% for htab map lookup. For htab_map_update/delete_elem(), the reason and the result is similar. Should I mark these two functions (lookup_nulls_elem_raw and lookup_elem_raw) as inline in the next revision, or should I leave it as is and try to fix the degradation in another patch set ? > Worst case we can specialize htab_map_gen_lookup() to > call into different helpers. > .