On Sat, Mar 08, 2025 at 08:05:09AM -0800, Elijah Newren wrote: > On Fri, Mar 7, 2025 at 6:20 AM Patrick Steinhardt <ps@xxxxxx> wrote: > > > > The `null_oid()` function returns the object ID that only consists of > > zeroes. Naturally, this ID also depends on the hash algorithm used, as > > the number of zeroes is different between SHA1 and SHA256. Consequently, > > the function returns the hash-algorithm-specific null object ID. > > > > This is currently done by depending on `the_hash_algo`, which implicitly > > makes us depend on `the_repository`. Refactor the function to instead > > pass in the hash algorithm for which we want to retrieve the null object > > ID. Adapt callsites accordingly by passing in `the_repository`, thus > > bubbling up the dependency on that global variable by one layer. > > > > There are a couple of trivial exceptions for subsystems that already got > > rid of `the_repository`. These subsystems instead use the repository > > that is available via the calling context: > > > > - "builtin/grep.c" > > - "grep.c" > > - "refs/debug.c" > > > > There are also two non-trivial exceptions: > > > > - "diff-no-index.c": Here we know that we may not have a repository > > initialized at all, so we cannot rely on `the_repository`. Instead, > > we adapt `diff_no_index()` to get a `struct git_hash_algo` as > > parameter. The only caller is located in "builtin/diff.c", where we > > know to call `repo_set_hash_algo()` in case we're running outside of > > a Git repository. Consequently, it is fine to continue passing > > `the_repository->hash_algo` even in this case. > > > > - "builtin/ls-files.c": There is an in-flight patch series that drops > > `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic > > conflict because we use `null_oid()` in `show_submodule()`. The > > value is passed to `repo_submodule_init()`, which may use the object > > ID to resolve a tree-ish in the superproject from which we want to > > read the submodule config. As such, the object ID should refer to an > > object in the superproject, and consequently we need to use its hash > > algorithm. > > > > This means that we could in theory just not bother about this edge case > > at all and just use `the_repository` in "diff-no-index.c". But doing so > > would feel misdesigned. > > Very minor, but this was a bit jarring to me -- shouldn't this > paragraph be indented over since it is a continuation of the second > bullet above? Yes, you're right! > > @@ -4304,7 +4304,7 @@ static int process_entry(struct merge_options *opt, > > /* Deleted on both sides */ > > ci->merged.is_null = 1; > > ci->merged.result.mode = 0; > > - oidcpy(&ci->merged.result.oid, null_oid()); > > + oidcpy(&ci->merged.result.oid, null_oid(the_hash_algo)); > > assert(!ci->df_conflict); > > ci->merged.clean = !ci->path_conflict; > > } > > What you have is an improvement since it's at least making things > explicit, but these should really be opt->repo->hash_algo. Oh, yeah, I suspect that we'll have a bunch of places where we can already plug in a repository as available via the context. I typically refrain from doing though unless really necessary (e.g. when the file at hand doesn't declare `USE_THE_REPOSITORY_VARIABLE` anymore). This is so that neither I nor reviewers have to carefully vet every callsite, but to make this a mechanical change that is obviously correct because it is identical to the old state. So I'm basically leaving it to the next patch series that gets rid of `USE_THE_REPOSITORY_VARIABLE` to think more carefully about those callsites. Thanks! Patrick