> On Oct 28, 2020, at 5:58 PM, Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > From: Andrii Nakryiko <andriin@xxxxxx> > > Revamp BTF dedup's string deduplication to match the approach of writable BTF > string management. This allows to transfer deduplicated strings index back to > BTF object after deduplication without expensive extra memory copying and hash > map re-construction. It also simplifies the code and speeds it up, because > hashmap-based string deduplication is faster than sort + unique approach. > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> LGTM, with a couple nitpick below: Acked-by: Song Liu <songliubraving@xxxxxx> > --- > tools/lib/bpf/btf.c | 265 +++++++++++++++++--------------------------- > 1 file changed, 99 insertions(+), 166 deletions(-) > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c > index 89fecfe5cb2b..db9331fea672 100644 > --- a/tools/lib/bpf/btf.c > +++ b/tools/lib/bpf/btf.c > @@ -90,6 +90,14 @@ struct btf { > struct hashmap *strs_hash; > /* whether strings are already deduplicated */ > bool strs_deduped; > + /* extra indirection layer to make strings hashmap work with stable > + * string offsets and ability to transparently choose between > + * btf->strs_data or btf_dedup->strs_data as a source of strings. > + * This is used for BTF strings dedup to transfer deduplicated strings > + * data back to struct btf without re-building strings index. > + */ > + void **strs_data_ptr; > + > /* BTF object FD, if loaded into kernel */ > int fd; > > @@ -1363,17 +1371,19 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name, > > static size_t strs_hash_fn(const void *key, void *ctx) > { > - struct btf *btf = ctx; > - const char *str = btf->strs_data + (long)key; > + const char ***strs_data_ptr = ctx; > + const char *strs = **strs_data_ptr; > + const char *str = strs + (long)key; Can we keep using btf as the ctx here? "char ***" hurts my eyes... [...] > - d->btf->hdr->str_len = end - start; > + /* replace BTF string data and hash with deduped ones */ > + free(d->btf->strs_data); > + hashmap__free(d->btf->strs_hash); > + d->btf->strs_data = d->strs_data; > + d->btf->strs_data_cap = d->strs_cap; > + d->btf->hdr->str_len = d->strs_len; > + d->btf->strs_hash = d->strs_hash; > + /* now point strs_data_ptr back to btf->strs_data */ > + d->btf->strs_data_ptr = &d->btf->strs_data; > + > + d->strs_data = d->strs_hash = NULL; > + d->strs_len = d->strs_cap = 0; > d->btf->strs_deduped = true; > + return 0; > + > +err_out: > + free(d->strs_data); > + hashmap__free(d->strs_hash); > + d->strs_data = d->strs_hash = NULL; > + d->strs_len = d->strs_cap = 0; > + > + /* restore strings pointer for existing d->btf->strs_hash back */ > + d->btf->strs_data_ptr = &d->strs_data; We have quite some duplicated code in err_out vs. succeed_out cases. How about we add a helper function, like void free_strs_data(struct btf_dedup *d) { free(d->strs_data); hashmap__free(d->strs_hash); d->strs_data = d->strs_hash = NULL; d->strs_len = d->strs_cap = 0; } ?