On Mon, Jun 10, 2024 at 3:27 PM Kyle Lippincott <spectral@xxxxxxxxxx> wrote: > I just realized I didn't give a reproduction command, sorry about that; here's the reproduction command provided by our user: git config --global core.abbrev 12 git clone https://github.com/git/git.git I realized this because Josh Steadmon informed me that config loading/parsing is lazy when I asked why `git bisect` still worked with this setting in my git config. So I think this might be a problem only for things that go through `git_default_config` (if we set up the repo object, we probably work just fine). > c8aed5e8dadf (repository: stop setting SHA1 as the default object > hash, 2024-05-07) stopped initializing the_hash_algo, but config.c > references it when it observes a user setting core.abbrev, in two > ways: > - if core.abbrev is detected as a 'no' boolean value, then > default_abbrev is set to the_hash_algo->hexsz > - if core.abbrev is set to an integer, it verifies that it's within > range for the hash algorithm (specifically: it errors out if the value > is < minimum_abbrev or > the_hash_algo->hexsz). > > Stack: > ==2421488==ERROR: AddressSanitizer: SEGV on unknown address > 0x000000000018 (pc 0x56344202585f bp 0x7fff9546fe10 sp 0x7fff9546fcb0 > T0) > ==2421488==The signal is caused by a READ memory access. > ==2421488==Hint: address points to the zero page. > #0 0x56344202585f in git_default_core_config git/config.c:1466 > #1 0x56344202585f in git_default_config git/config.c:1815 > #2 0x56344202064e in configset_iter git/config.c:2185 > #3 0x563441d531cb in cmd_clone builtin/clone.c:981 > #4 0x563441cebac2 in run_builtin git/git.c:474 > #5 0x563441cebac2 in handle_builtin git/git.c:729 > #6 0x563441ceed0a in run_argv git/git.c:793 > #7 0x563441cf0aea in cmd_main git/git.c:928 > #8 0x563441ce9323 in main git/common-main.c:62 > #9 0x7fa3228456c9 in __libc_start_call_main > ../sysdeps/nptl/libc_start_call_main.h:58 > #10 0x7fa322845784 in __libc_start_main_impl ../csu/libc-start.c:360 > #11 0x563441ceb530 in _start (git/git+0x1e0530) (BuildId: > c0e4b09d5b212a201769f1eb8e7592cddbe3af1d) > > AddressSanitizer can not provide additional info. > > --- > > My first thought for a fix was to just cap it at 40, with the > assumption that the code would handle it correctly in the unlikely > event that the hash size ever decreased, but I don't think that does > the right thing if `core.abbrev=no`. That's documented as a way of > obtaining the full hashes (with no abbreviation), and if we're using > sha256, capping that at 40 hex (160bits) is incorrect. > > My second thought was that we could store the requested value and > validate it on every usage. This complicates usage locations, and can > lead to poor behavior (crashes in the middle of operation when we > finally get around to checking the value). > > My third thought was that we could store the requested value and > validate when we have a repository that initializes the hash for us as > part of that initialization. If we attempt to abbreviate some hashes > without that setup, we act as if core.abbrev isn't set at all (so they > get the default behavior). That seems like the best option overall. > Exploring that further ... > > I've looked at a semi-random collection of places where > `default_abbrev` or `DEFAULT_ABBREV` are used, and they all seem to > eventually go through `repo_find_unique_abbrev_r`, and they pass in > the len. This also always has a repository available (otherwise it > wouldn't be able to disambiguate). Furthermore, it has `if (len < 0)`. > We could thus carry the "unvalidated" request in default_abbrev by > having special magic values (auto=-1 like today, no=-2 (replaced with > hexsz when we know it), other requests are <= -4, for a requested > length of 4 or higher), or we could have another variable > (requested_default_abbrev) that gets copied to default_abbrev when we > have a repo. > > The one potential issue I can think of with this is that setting > `core.abbrev = no`, having that resolve to 64 (sha256) when we set up > the repo, and then if we ever read from a repo that uses 40 hexsz > (such as sha1), then we have to ensure that we tolerate a requested > length greater than the current hash algorithm's maximum length. This > likely wasn't a problem when sha1 was the default, because we're > unlikely to go to a hash algorithm with <160 bits in the future. But > if sha256 becomes the default, then this can be problematic.