SEGV (detected by Address Sanitizer) when using `core.abbrev` option

Kyle Lippincott <spectral@xxxxxxxxxx> · Mon, 10 Jun 2024 15:27:11 -0700

c8aed5e8dadf (repository: stop setting SHA1 as the default object
hash, 2024-05-07) stopped initializing the_hash_algo, but config.c
references it when it observes a user setting core.abbrev, in two
ways:
- if core.abbrev is detected as a 'no' boolean value, then
default_abbrev is set to the_hash_algo->hexsz
- if core.abbrev is set to an integer, it verifies that it's within
range for the hash algorithm (specifically: it errors out if the value
is < minimum_abbrev or > the_hash_algo->hexsz).

Stack:
==2421488==ERROR: AddressSanitizer: SEGV on unknown address
0x000000000018 (pc 0x56344202585f bp 0x7fff9546fe10 sp 0x7fff9546fcb0
T0)
==2421488==The signal is caused by a READ memory access.
==2421488==Hint: address points to the zero page.
    #0 0x56344202585f in git_default_core_config git/config.c:1466
    #1 0x56344202585f in git_default_config git/config.c:1815
    #2 0x56344202064e in configset_iter git/config.c:2185
    #3 0x563441d531cb in cmd_clone builtin/clone.c:981
    #4 0x563441cebac2 in run_builtin git/git.c:474
    #5 0x563441cebac2 in handle_builtin git/git.c:729
    #6 0x563441ceed0a in run_argv git/git.c:793
    #7 0x563441cf0aea in cmd_main git/git.c:928
    #8 0x563441ce9323 in main git/common-main.c:62
    #9 0x7fa3228456c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    #10 0x7fa322845784 in __libc_start_main_impl ../csu/libc-start.c:360
    #11 0x563441ceb530 in _start (git/git+0x1e0530) (BuildId:
c0e4b09d5b212a201769f1eb8e7592cddbe3af1d)

AddressSanitizer can not provide additional info.

---

My first thought for a fix was to just cap it at 40, with the
assumption that the code would handle it correctly in the unlikely
event that the hash size ever decreased, but I don't think that does
the right thing if `core.abbrev=no`. That's documented as a way of
obtaining the full hashes (with no abbreviation), and if we're using
sha256, capping that at 40 hex (160bits) is incorrect.

My second thought was that we could store the requested value and
validate it on every usage. This complicates usage locations, and can
lead to poor behavior (crashes in the middle of operation when we
finally get around to checking the value).

My third thought was that we could store the requested value and
validate when we have a repository that initializes the hash for us as
part of that initialization. If we attempt to abbreviate some hashes
without that setup, we act as if core.abbrev isn't set at all (so they
get the default behavior). That seems like the best option overall.
Exploring that further ...

I've looked at a semi-random collection of places where
`default_abbrev` or `DEFAULT_ABBREV` are used, and they all seem to
eventually go through `repo_find_unique_abbrev_r`, and they pass in
the len. This also always has a repository available (otherwise it
wouldn't be able to disambiguate). Furthermore, it has `if (len < 0)`.
We could thus carry the "unvalidated" request in default_abbrev by
having special magic values (auto=-1 like today, no=-2 (replaced with
hexsz when we know it), other requests are <= -4, for a requested
length of 4 or higher), or we could have another variable
(requested_default_abbrev) that gets copied to default_abbrev when we
have a repo.

The one potential issue I can think of with this is that setting
`core.abbrev = no`, having that resolve to 64 (sha256) when we set up
the repo, and then if we ever read from a repo that uses 40 hexsz
(such as sha1), then we have to ensure that we tolerate a requested
length greater than the current hash algorithm's maximum length. This
likely wasn't a problem when sha1 was the default, because we're
unlikely to go to a hash algorithm with <160 bits in the future. But
if sha256 becomes the default, then this can be problematic.