On 11/11/24 3:54 PM, Jeff King wrote: > On Mon, Nov 11, 2024 at 11:21:44AM -0500, Simon Marchi wrote: > >> When running a dir-diff command that produces no diff, variables >> `wt_modified` and `tmp_modified` are used while uninitialized, causing: >> >> $ /home/smarchi/src/git/git-difftool --dir-diff master >> free(): invalid pointer >> [1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master >> $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master >> ... >> Invalid free() / delete / delete[] / realloc() >> at 0x48478EF: free (vg_replace_malloc.c:989) >> by 0x422CAC: hashmap_clear_ (hashmap.c:208) >> by 0x283830: run_dir_diff (difftool.c:667) >> by 0x284103: cmd_difftool (difftool.c:801) >> by 0x238E0F: run_builtin (git.c:484) >> by 0x2392B9: handle_builtin (git.c:750) >> by 0x2399BC: cmd_main (git.c:921) >> by 0x356FEF: main (common-main.c:64) >> Address 0x1ffefff180 is on thread 1's stack >> in frame #2, created by run_dir_diff (difftool.c:358) >> ... >> >> If taking any `goto finish` path before these variables are initialized, >> `hashmap_clear_and_free()` operates on uninitialized data, sometimes >> causing a crash. >> >> Fix it by zero-initializing these variables, making >> `hashmap_clear_and_free()` a no-op in that case. > > The fix makes sense. I wondered if this had been broken for a long time, > and if so, how we managed not to notice it. But it looks like it is a > recent problem, via 7f795a1715 (builtin/difftool: plug several trivial > memory leaks, 2024-09-26). Are there tests for this specific scenario (no diff between the two versions)? > >> diff --git a/builtin/difftool.c b/builtin/difftool.c >> index ca1b0890659b..b902f5d2ae17 100644 >> --- a/builtin/difftool.c >> +++ b/builtin/difftool.c >> @@ -376,7 +376,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, >> struct checkout lstate, rstate; >> int err = 0; >> struct child_process cmd = CHILD_PROCESS_INIT; >> - struct hashmap wt_modified, tmp_modified; >> + struct hashmap wt_modified = {0}; >> + struct hashmap tmp_modified = {0}; >> int indices_loaded = 0; > > That commit likewise frees some other local variables, but they are all > properly initialized. So touching these two are sufficient. Indeed, I checked the other variables, they look fine. > I'm not sure if zero-initialization is being a little too familiar with > the hashmap internals, though. Up to you. In other C projects I worked on, it was typical that zero-ing an object would get it in a valid initial empty state, properly handled by the destruction functions. This way, a big struct containing other objects could be initialized simply by zero-ing it, without having to initialize each component explicitly. > The other variables use HASHMAP_INIT(). > Should we do the same here, like this: > > diff --git a/builtin/difftool.c b/builtin/difftool.c > index 1a68ab6699..86995390c7 100644 > --- a/builtin/difftool.c > +++ b/builtin/difftool.c > @@ -374,7 +374,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, > struct checkout lstate, rstate; > int err = 0; > struct child_process cmd = CHILD_PROCESS_INIT; > - struct hashmap wt_modified, tmp_modified; > + struct hashmap wt_modified = HASHMAP_INIT(path_entry_cmp, NULL); > + struct hashmap tmp_modified = HASHMAP_INIT(path_entry_cmp, NULL); > int indices_loaded = 0; > > workdir = get_git_work_tree(); > @@ -594,14 +595,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, > * should be copied back to the working tree. > * Do not copy back files when symlinks are used and the > * external tool did not replace the original link with a file. > - * > - * These hashes are loaded lazily since they aren't needed > - * in the common case of --symlinks and the difftool updating > - * files through the symlink. > */ > - hashmap_init(&wt_modified, path_entry_cmp, NULL, wtindex.cache_nr); > - hashmap_init(&tmp_modified, path_entry_cmp, NULL, wtindex.cache_nr); > - > for (i = 0; i < wtindex.cache_nr; i++) { > struct hashmap_entry dummy; > const char *name = wtindex.cache[i]->name; > > That loses the initial table growth that the original had, but I think > letting it grow in the usual way is fine here. I thought about it, but was indeed afraid to be told that this removes an optimization. If you think it's fine, I'm happy with it too. Please let me know if you want a v2 or if you are just going to merge an updated version of this patch. Thanks, Simon