On Mon, Jun 10, 2024 at 03:27:11PM -0700, Kyle Lippincott wrote: > c8aed5e8dadf (repository: stop setting SHA1 as the default object > hash, 2024-05-07) stopped initializing the_hash_algo, but config.c > references it when it observes a user setting core.abbrev, in two > ways: > - if core.abbrev is detected as a 'no' boolean value, then > default_abbrev is set to the_hash_algo->hexsz > - if core.abbrev is set to an integer, it verifies that it's within > range for the hash algorithm (specifically: it errors out if the value > is < minimum_abbrev or > the_hash_algo->hexsz). > > Stack: > ==2421488==ERROR: AddressSanitizer: SEGV on unknown address > 0x000000000018 (pc 0x56344202585f bp 0x7fff9546fe10 sp 0x7fff9546fcb0 > T0) > ==2421488==The signal is caused by a READ memory access. > ==2421488==Hint: address points to the zero page. > #0 0x56344202585f in git_default_core_config git/config.c:1466 > #1 0x56344202585f in git_default_config git/config.c:1815 > #2 0x56344202064e in configset_iter git/config.c:2185 > #3 0x563441d531cb in cmd_clone builtin/clone.c:981 > #4 0x563441cebac2 in run_builtin git/git.c:474 > #5 0x563441cebac2 in handle_builtin git/git.c:729 > #6 0x563441ceed0a in run_argv git/git.c:793 > #7 0x563441cf0aea in cmd_main git/git.c:928 > #8 0x563441ce9323 in main git/common-main.c:62 > #9 0x7fa3228456c9 in __libc_start_call_main > ../sysdeps/nptl/libc_start_call_main.h:58 > #10 0x7fa322845784 in __libc_start_main_impl ../csu/libc-start.c:360 > #11 0x563441ceb530 in _start (git/git+0x1e0530) (BuildId: > c0e4b09d5b212a201769f1eb8e7592cddbe3af1d) > > AddressSanitizer can not provide additional info. Good find. > My first thought for a fix was to just cap it at 40, with the > assumption that the code would handle it correctly in the unlikely > event that the hash size ever decreased, but I don't think that does > the right thing if `core.abbrev=no`. That's documented as a way of > obtaining the full hashes (with no abbreviation), and if we're using > sha256, capping that at 40 hex (160bits) is incorrect. Exactly, this doesn't feel like the right solution. > My second thought was that we could store the requested value and > validate it on every usage. This complicates usage locations, and can > lead to poor behavior (crashes in the middle of operation when we > finally get around to checking the value). This feels more complex than needed indeed. > My third thought was that we could store the requested value and > validate when we have a repository that initializes the hash for us as > part of that initialization. If we attempt to abbreviate some hashes > without that setup, we act as if core.abbrev isn't set at all (so they > get the default behavior). That seems like the best option overall. > Exploring that further ... > > I've looked at a semi-random collection of places where > `default_abbrev` or `DEFAULT_ABBREV` are used, and they all seem to > eventually go through `repo_find_unique_abbrev_r`, and they pass in > the len. This also always has a repository available (otherwise it > wouldn't be able to disambiguate). Furthermore, it has `if (len < 0)`. > We could thus carry the "unvalidated" request in default_abbrev by > having special magic values (auto=-1 like today, no=-2 (replaced with > hexsz when we know it), other requests are <= -4, for a requested > length of 4 or higher), or we could have another variable > (requested_default_abbrev) that gets copied to default_abbrev when we > have a repo. This also feels a bit too complex in my opinion. > The one potential issue I can think of with this is that setting > `core.abbrev = no`, having that resolve to 64 (sha256) when we set up > the repo, and then if we ever read from a repo that uses 40 hexsz > (such as sha1), then we have to ensure that we tolerate a requested > length greater than the current hash algorithm's maximum length. This > likely wasn't a problem when sha1 was the default, because we're > unlikely to go to a hash algorithm with <160 bits in the future. But > if sha256 becomes the default, then this can be problematic. I think this is the best solution. I don't quite see why we need to abort in the first place when the caller asks for something longer than GIT_MAX_HEXSZ. If they do, it's rather clear that the intent is to show the full object hash, so we can do that. In other words, the following should be sufficient, shouldn't it? diff --git a/config.c b/config.c index abce05b774..0416b0f2b6 100644 --- a/config.c +++ b/config.c @@ -1460,10 +1460,10 @@ static int git_default_core_config(const char *var, const char *value, if (!strcasecmp(value, "auto")) default_abbrev = -1; else if (!git_parse_maybe_bool_text(value)) - default_abbrev = the_hash_algo->hexsz; + default_abbrev = GIT_MAX_HEXSZ; else { int abbrev = git_config_int(var, value, ctx->kvi); - if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz) + if (abbrev < minimum_abbrev) return error(_("abbrev length out of range: %d"), abbrev); default_abbrev = abbrev; } diff --git a/object-name.c b/object-name.c index 523af6f64f..1be2ad1a16 100644 --- a/object-name.c +++ b/object-name.c @@ -837,7 +837,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex, } oid_to_hex_r(hex, oid); - if (len == hexsz || !len) + if (len >= hexsz || !len) return hexsz; mad.repo = r; This is also in line with `parse_opt_abbrev_cb()`, which does the following: if (v && v < MINIMUM_ABBREV) v = MINIMUM_ABBREV; else if (startup_info->have_repository && v > the_hash_algo->hexsz) v = the_hash_algo->hexsz; In other words, it trims to `the_hash_algo->hexsz` or, if we don't have a repository, it doesn't bother to trim the value at all. I'll send a patch that does this. Patrick
Attachment:
signature.asc
Description: PGP signature