On Fri, Jul 30 2021, Elijah Newren wrote: > Hi Ævar, > > On Fri, Jul 30, 2021 at 7:33 AM Ævar Arnfjörð Bjarmason > <avarab@xxxxxxxxx> wrote: >> >> >> On Thu, Jul 29 2021, Jeff King wrote: >> >> > On Thu, Jul 29, 2021 at 12:37:52PM -0600, Elijah Newren wrote: >> > >> >> > Arguably, the existence of these function indirections is perhaps a sign >> >> > that the strmap API should provide a version of the clear functions that >> >> > takes "partial / not-partial" as a parameter. >> >> >> >> Are you suggesting a modification of str{map,intmap,set}_clear() to >> >> take an extra parameter, or removing the >> >> str{map,intmap,set}_partial_clear() functions and introducing new >> >> functions that take a partial/not-partial parameter? I think you're >> >> suggesting the latter, and that makes more sense to me...but I'm >> >> drawing blanks trying to come up with a reasonable function name. >> > >> > It does seem a shame to add the "partial" parameter to strmap_clear(), >> > just because most callers don't need it (so they end up with this >> > inscrutable "0" parameter). >> > >> > What if there was a flags field? Then it could be combined with the >> > free_values parameter. The result is kind of verbose in two ways: >> > >> > - now strset_clear(), etc, need a "flags" parameter, which they didn't >> > before (and is just "0" most of the time!) >> > >> > - now "strmap_clear(foo, 1)" becomes "strmap_clear(foo, STRMAP_FREE_VALUES)". >> > That's a lot longer, though arguably it's easier to understand since >> > the boolean is explained. >> > >> > Having gone through the exercise, I am not sure it is actually making >> > anything more readable (messy patch is below for reference). >> >> I've got some WIP patches for string-list.h and strmap.h to make the API >> nicer, and it's probably applicable to strset.h too. > > There is no strset.h; strset and strintmap along with strmap are part > of strmap.h. > >> I.e. I found when using strset.h that it was a weird API to use, because >> unlike string-list.h it didn't pay attention to your "dup" field when >> freeing, you had to do it explicitly. > > Do you mean strmap.h instead of strset.h? Yes, sorry. Brainfart. > In general, if you are asking strmap/strset/strintmap to dup your keys > and are explicitly freeing the strings, then you are misusing the API > and either freeing pointers that were never allocated or getting > double frees. It's wrong to explicitly deallocate them because: > * When using a pool, we just allocate from the pool. The memory > will be freed when the pool is freed. > * When not using a pool, we use FLEXPTR_ALLOC_STR in order to make > the string be part of the allocated strmap_entry. The string's memory > is deallocated when the strmap_entry is. > > The only reason to explicitly free keys in a strmap/strset/strintmap > is if you do NOT have strdup_strings set and allocated the strings > elsewhere and left your strmap as the only thing tracking the strings. Yes, sorry. I think I was trying to address that case, i.e. it started with fixing some memory leaks, but it's part of a branch of mine that's in a messy WIP state of not passing the tests. Please ignore the rest of the hunks to do with strmap.h. I might have some worthwhile fixes there, maybe not. It mainly started with carrying things over from similar changes in string-list.h. >> And then in e.g. merge-ort.c there's this "strdup dance" pattern where >> we flip the field back and forth. >> >> The below diff is exctracted from that WIP work, with the relevant two >> API headers and then two changed API users for show (the tree-wide >> changes are much larger). >> >> I think making the promise I make in the updated docs at "We guarantee >> that the `clearfunc`[...]" in string-list.h makes for particularly nice >> API behavior. >> >> builtin/remote.c | 37 ++++++++++++++++++++--------------- >> merge-ort.c | 32 +++++++----------------------- >> string-list.h | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- >> strmap.h | 13 +++++++++++++ >> 4 files changed, 98 insertions(+), 43 deletions(-) >> >> diff --git a/builtin/remote.c b/builtin/remote.c >> index 7f88e6ce9de..ec1dbd49f71 100644 >> --- a/builtin/remote.c >> +++ b/builtin/remote.c >> @@ -340,10 +340,24 @@ static void read_branches(void) >> >> struct ref_states { >> struct remote *remote; >> - struct string_list new_refs, stale, tracked, heads, push; >> + >> + struct string_list new_refs; >> + struct string_list stale; >> + struct string_list tracked; >> + struct string_list heads; >> + struct string_list push; >> + >> int queried; >> }; >> >> +#define REF_STATES_INIT { \ >> + .new_refs = STRING_LIST_INIT_DUP, \ >> + .stale = STRING_LIST_INIT_DUP, \ >> + .tracked = STRING_LIST_INIT_DUP, \ >> + .heads = STRING_LIST_INIT_DUP, \ >> + .push = STRING_LIST_INIT_DUP, \ >> +} >> + >> static int get_ref_states(const struct ref *remote_refs, struct ref_states *states) >> { >> struct ref *fetch_map = NULL, **tail = &fetch_map; >> @@ -355,9 +369,6 @@ static int get_ref_states(const struct ref *remote_refs, struct ref_states *stat >> die(_("Could not get fetch map for refspec %s"), >> states->remote->fetch.raw[i]); >> >> - states->new_refs.strdup_strings = 1; >> - states->tracked.strdup_strings = 1; >> - states->stale.strdup_strings = 1; >> for (ref = fetch_map; ref; ref = ref->next) { >> if (!ref->peer_ref || !ref_exists(ref->peer_ref->name)) >> string_list_append(&states->new_refs, abbrev_branch(ref->name)); >> @@ -406,7 +417,6 @@ static int get_push_ref_states(const struct ref *remote_refs, >> >> match_push_refs(local_refs, &push_map, &remote->push, MATCH_REFS_NONE); >> >> - states->push.strdup_strings = 1; >> for (ref = push_map; ref; ref = ref->next) { >> struct string_list_item *item; >> struct push_info *info; >> @@ -449,7 +459,6 @@ static int get_push_ref_states_noquery(struct ref_states *states) >> if (remote->mirror) >> return 0; >> >> - states->push.strdup_strings = 1; >> if (!remote->push.nr) { >> item = string_list_append(&states->push, _("(matching)")); >> info = item->util = xcalloc(1, sizeof(struct push_info)); >> @@ -483,7 +492,6 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat >> refspec.force = 0; >> refspec.pattern = 1; >> refspec.src = refspec.dst = "refs/heads/*"; >> - states->heads.strdup_strings = 1; >> get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0); >> matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"), >> fetch_map, 1); >> @@ -905,7 +913,7 @@ static void clear_push_info(void *util, const char *string) >> { >> struct push_info *info = util; >> free(info->dest); >> - free(info); >> + /* note: fixed memleak here */ >> } >> >> static void free_remote_ref_states(struct ref_states *states) >> @@ -1159,7 +1167,7 @@ static int get_one_entry(struct remote *remote, void *priv) >> string_list_append(list, remote->name)->util = >> strbuf_detach(&url_buf, NULL); >> } else >> - string_list_append(list, remote->name)->util = NULL; >> + string_list_append(list, remote->name); >> if (remote->pushurl_nr) { >> url = remote->pushurl; >> url_nr = remote->pushurl_nr; >> @@ -1179,10 +1187,9 @@ static int get_one_entry(struct remote *remote, void *priv) >> >> static int show_all(void) >> { >> - struct string_list list = STRING_LIST_INIT_NODUP; >> + struct string_list list = STRING_LIST_INIT_DUP; >> int result; >> >> - list.strdup_strings = 1; >> result = for_each_remote(get_one_entry, &list); >> >> if (!result) { >> @@ -1212,7 +1219,7 @@ static int show(int argc, const char **argv) >> OPT_BOOL('n', NULL, &no_query, N_("do not query remotes")), >> OPT_END() >> }; >> - struct ref_states states; >> + struct ref_states states = REF_STATES_INIT; >> struct string_list info_list = STRING_LIST_INIT_NODUP; >> struct show_info info; >> >> @@ -1334,8 +1341,7 @@ static int set_head(int argc, const char **argv) >> if (!opt_a && !opt_d && argc == 2) { >> head_name = xstrdup(argv[1]); >> } else if (opt_a && !opt_d && argc == 1) { >> - struct ref_states states; >> - memset(&states, 0, sizeof(states)); >> + struct ref_states states = REF_STATES_INIT; >> get_remote_ref_states(argv[0], &states, GET_HEAD_NAMES); >> if (!states.heads.nr) >> result |= error(_("Cannot determine remote HEAD")); >> @@ -1374,14 +1380,13 @@ static int set_head(int argc, const char **argv) >> static int prune_remote(const char *remote, int dry_run) >> { >> int result = 0; >> - struct ref_states states; >> + struct ref_states states = REF_STATES_INIT; >> struct string_list refs_to_prune = STRING_LIST_INIT_NODUP; >> struct string_list_item *item; >> const char *dangling_msg = dry_run >> ? _(" %s will become dangling!") >> : _(" %s has become dangling!"); >> >> - memset(&states, 0, sizeof(states)); >> get_remote_ref_states(remote, &states, GET_REF_STATES); >> >> if (!states.stale.nr) { > > Everything up to here looks like a very nice cleanup. > >> diff --git a/merge-ort.c b/merge-ort.c >> index ec0c5904211..53ed78e7a01 100644 >> --- a/merge-ort.c >> +++ b/merge-ort.c >> @@ -432,16 +432,6 @@ struct conflict_info { >> assert((ci) && !(mi)->clean); \ >> } while (0) >> >> -static void free_strmap_strings(struct strmap *map) >> -{ >> - struct hashmap_iter iter; >> - struct strmap_entry *entry; >> - >> - strmap_for_each_entry(map, &iter, entry) { >> - free((char*)entry->key); >> - } >> -} >> - >> static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, >> int reinitialize) >> { >> @@ -455,13 +445,11 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, >> reinitialize ? strset_partial_clear : strset_clear; >> >> /* >> - * We marked opti->paths with strdup_strings = 0, so that we >> - * wouldn't have to make another copy of the fullpath created by >> - * make_traverse_path from setup_path_info(). But, now that we've >> - * used it and have no other references to these strings, it is time >> - * to deallocate them. >> + * We used the the pattern of re-using already allocated >> + * strings strmap_clear_strings() in make_traverse_path from >> + * setup_path_info(). Deallocate them. >> */ >> - free_strmap_strings(&opti->paths); >> + strmap_clear_strings(&opti->paths, 0); >> strmap_func(&opti->paths, 1); >> >> /* > > It's not clear to me that strmap should handle the freeing of the keys > at all; maybe it should and strmap_clear_strings() makes sense to > introduce. However, this change is clearly wrong regardless, for two > reasons: (1) You are double clearing since strmap_func() is also > called afterwards, and (2) you are also ignoring the potential partial > bit since strmap_func might be strmap_partial_clear() rather than > strmap_clear(). *Nod*, see above. >> @@ -472,15 +460,10 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, >> strmap_func(&opti->conflicted, 0); >> >> /* >> - * opti->paths_to_free is similar to opti->paths; we created it with >> - * strdup_strings = 0 to avoid making _another_ copy of the fullpath >> - * but now that we've used it and have no other references to these >> - * strings, it is time to deallocate them. We do so by temporarily >> - * setting strdup_strings to 1. >> + * opti->paths_to_free is similar to opti->paths; it's memory >> + * we borrowed and need to free with string_list_clear_strings(). >> */ >> - opti->paths_to_free.strdup_strings = 1; >> - string_list_clear(&opti->paths_to_free, 0); >> - opti->paths_to_free.strdup_strings = 0; >> + string_list_clear_strings(&opti->paths_to_free, 0); > > This is very nice. I really like this new function and API. > >> if (opti->attr_index.cache_nr) /* true iff opt->renormalize */ >> discard_index(&opti->attr_index); >> @@ -2664,7 +2647,6 @@ static int collect_renames(struct merge_options *opt, >> * and have no other references to these strings, it is time to >> * deallocate them. >> */ >> - free_strmap_strings(&collisions); >> strmap_clear(&collisions, 1); >> return clean; >> } > > This hunk is wrong. *nod* >> diff --git a/string-list.h b/string-list.h >> index 0d6b4692396..9eeea996888 100644 >> --- a/string-list.h >> +++ b/string-list.h >> @@ -109,6 +109,9 @@ void string_list_init_dup(struct string_list *list); >> */ >> void string_list_init(struct string_list *list, int strdup_strings); >> >> +void string_list_cmp_init(struct string_list *list, int strdup_strings, >> + compare_strings_fn cmp); >> + > > Seems unrelated to what you were trying to highlight? Yes, sorry. I just extracted all the diff WIP diff I had for string-list.h, this bit was unrelated. It's unrelated cleanup of various things that hardcoded all the fields during their "init", just because they need strcasecmp or whatever instead of strcmp. >> /** Callback function type for for_each_string_list */ >> typedef int (*string_list_each_func_t)(struct string_list_item *, void *); >> >> @@ -129,14 +132,66 @@ void filter_string_list(struct string_list *list, int free_util, >> */ >> void string_list_clear(struct string_list *list, int free_util); >> >> +/** >> + * Free a string list initialized without `strdup_strings = 1`, but >> + * where we also want to free() the strings. You usually want to just >> + * use string_list_clear() after initializing with >> + * `STRING_LIST_INIT_DUP' instead. >> + * >> + * Useful to free e.g. a string list whose strings came from >> + * strbuf_detach() or other memory that we didn't initially allocate >> + * on the heap, but which we now manage. >> + * >> + * Under the hood this is identical in behavior to temporarily setting >> + * `strbuf_strings` to `1` for the duration of this function call, but >> + * without the verbosity of performing that dance yourself. >> + */ >> +void string_list_clear_strings(struct string_list *list, int free_util); >> + >> +/** >> + * Clear only the `util` pointer, but not the `string`, even if >> + * `strdup_strings = 1` is set. Useful for the idiom of doing e.g.: >> + * >> + * string_list_append(&list, str + offs)->util = str; >> + * >> + * Where we add a string at some offset, own the string (so >> + * effectively `strdup_strings = `), but can't free() the string >> + * itself at the changed offset, but need to free the original data in >> + * `util` instead. >> + */ >> +void string_list_clear_util(struct string_list *list); >> + >> /** >> * Callback type for `string_list_clear_func`. The string associated >> * with the util pointer is passed as the second argument >> */ >> typedef void (*string_list_clear_func_t)(void *p, const char *str); >> >> -/** Call a custom clear function on each util pointer */ >> -void string_list_clear_func(struct string_list *list, string_list_clear_func_t clearfunc); >> +/** >> + * Like string_list_clear() except that it first calls a custom clear >> + * function on each util pointer. >> + * >> + * We guarantee that the `clearfunc` will be called on all util >> + * pointers in a list before we proceed to free the first string or >> + * util pointer, i.e. should you need to it's OK to peek at other util >> + * items in the list itself, or to otherwise iterate it from within >> + * the `clearfunc`. >> + * >> + * You do not need to free() the passed-in util pointer itself, >> + * i.e. after calling all `clearfunc` this has the seme behavior as >> + * string_list_clear() called with with `free_util = 1`. >> + */ >> +void string_list_clear_func(struct string_list *list, >> + string_list_clear_func_t clearfunc); >> + >> +/** >> + * Like string_list_clear_func() but free the strings too, using the >> + * same dance as described for string_list_clear_strings() >> + * above. You'll usually want to initialize with >> + * `STRING_LIST_INIT_DUP` and use string_list_clear_strings() instead. >> + */ >> +void string_list_clear_strings_func(struct string_list *list, >> + string_list_clear_func_t clearfunc); >> >> /** >> * Apply `func` to each item. If `func` returns nonzero, the > > string_list_clear_strings() looks very nice. The others are probably > good too, though I'm curious about the need for double walking the > list to free it instead of doing it in a single walk; what callers > need to walk the list and check out other values? I think there were a couple of users that needed that, but maybe I'm wrong. I think even if there's not clearly defining the callback freeing semantics makes sense for caller sanity. We double-walk now in string_list_clear(), this is mostly documenting and extending current behavior to being able to clear any arbitrary combination of string/util with an optional cb, regardless of your strdup_strings state. >> diff --git a/strmap.h b/strmap.h >> index 1e152d832d6..337f6278e86 100644 >> --- a/strmap.h >> +++ b/strmap.h >> @@ -51,12 +51,25 @@ void strmap_init_with_options(struct strmap *map, >> */ >> void strmap_clear(struct strmap *map, int free_values); >> >> +/** >> + * To strmap_clear() what string_list_clear_strings() is to >> + * string_list_clear(). I.e. free your keys too, which we used as-is >> + * without `strdup_strings = 1`. >> + */ >> +void strmap_clear_strings(struct strmap *map, int free_values); > > strmap.h doesn't depend on string-list.h, so the comment should be > self-standing. The analogy also doesn't seem to hold since we do NOT > need to free the keys when strdup_strings is 1; Peff suggested > FLEXPTR_ALLOC_STR specifically to avoid that extra allocation in that > case. *nod*, see above (i.e. maybe all this strmap.h stuff is wrong). FWIW that comment was meant as a "if you're familiar with X in API A, this Y in B works similarly". >> + >> /* >> * Similar to strmap_clear() but leaves map->map->table allocated and >> * pre-sized so that subsequent uses won't need as many rehashings. >> */ >> void strmap_partial_clear(struct strmap *map, int free_values); >> >> +/** >> + * To strmap_partial_clear() what string_list_clear_strings() is to >> + * string_list_clear(). See strmap_clear_strings() above. >> + */ >> +void strmap_partial_clear_strings(struct strmap *map, int free_values); >> + > > Same comment as above for strmap_clear_strings() applies here. *nod*