[PATCH] get_oid(): enforce minimum length for "-g<hex>" names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 13, 2024 at 07:53:58AM -0400, Jeff King wrote:

>   - the hash is abbreviated in the usual way, making it as short as
>     possible while remaining unambiguous. But unless the user goes out
>     of their way to set core.abbrev to something smaller, the minimum is
>     always 7. So perhaps get_describe_name() should be a bit more picky
>     about about that?
> 
>     That doesn't fix the problem, but it makes it a lot less likely to
>     trigger in the real world. And anybody who really does somehow end
>     up with a describe name with 4 characters can always pick the hash
>     out of the string themselves (or just set core.abbrev in their local
>     repo to be more permissive).
> 
> I think the second one is something like this:
> 
> diff --git a/object-name.c b/object-name.c
> index 527b853ac4..a90338aa62 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -1276,6 +1276,10 @@ static int get_describe_name(struct repository *r,
>  			if (ch == 'g' && cp[-1] == '-') {
>  				cp++;
>  				len -= cp - name;
> +				if (len < (default_abbrev < 0 ?
> +					   FALLBACK_DEFAULT_ABBREV :
> +					   default_abbrev))
> +					return -1;
>  				return get_short_oid(r,
>  						     cp, len, oid, flags);
>  			}

After thinking on this for a bit, it seems like the correct direction.
So here it is written in a slightly more readable way, and with a commit
message and tests.

-- >8 --
Subject: [PATCH] get_oid(): enforce minimum length for "-g<hex>" names

Since 7dd45e15c2 (sha1_name.c: understand "describe" output as a valid
object name, 2006-09-20), we'll resolve a name like "foo-g1234abc" into
the object name "1234abc". However, this has a small chance of creating
a surprising ambiguity.

For example, in a real world case we saw a name like "foo-gcc14" where
"gcc14" refers to the compiler, but which caused us to look for an
object with prefix "cc14". And if the repo is large enough to have such
an object, and small enough that there is only one such commit (since we
feed the disambiguation lookup code with the "commit" hint), then we'll
return that object.

Note that we would still resolve "foo-gcc14" as a tag name in preference
to the describe name. But in this case it did not exist, and resolving
anything was a surprise.

We can't solve the ambiguity completely, but we can reduce the chances
of it happening significantly by enforcing a minimum length we'll accept
for the hex component. Since the name may have been generated by another
repository, we can't know for sure what minimum they would have used,
but a good guess is the value of core.abbrev (or if it's set to "auto",
which is the default these days, the hard-coded minimum of "7").

There are five new tests here:

  1. We check that describe names can be resolved at all. As far as I
     can tell we had no existing test that covered this. It passes
     before and after this patch.

  2. Another option for solving this would be to insist that "foo" in
     "foo-gcc14" matches an existing ref (which would have been the
     source of the description). I don't think this is a good idea,
     though, as part of the point of having the "-g<hex>" suffix is that
     you might not have the same tags as whoever generated it. So this
     test codifies the existing behavior that we do not care about the
     parts before the "-g" at all.

  3. Looking at the loop in get_describe_name(), we read from the back
     end and check for "-g" when we see a non-hex digit. But if it's not
     "-g", we keep looking! So for a name like "foo-g1234abc-bar", we'll
     still pass "1234abc-bar" to get_short_oid()! This is OK in
     practice, since it will barf when seeing the non-hex digits. But
     let's confirm that it does so. This is particularly important
     with our length checks, since "foo-gcc14-bar" would yield a length
     of 8, which is plausibly long (so we are likewise depending on
     get_short_oid() to reject it).

  4. We check that names shorter than core.abbrev are rejected (i.e.,
     the fix in this patch).

  5. Likewise, when core.abbrev is "auto", we enforce the 7-character
     minimum.

Reported-by: Matt Thompson <fortran@xxxxxxxxx>
Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 object-name.c       | 12 ++++++++++++
 t/t6120-describe.sh | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/object-name.c b/object-name.c
index 527b853ac4..6507a30ace 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1274,8 +1274,20 @@ static int get_describe_name(struct repository *r,
 			 * for it to be describe output.
 			 */
 			if (ch == 'g' && cp[-1] == '-') {
+				/*
+				 * To reduce the chance of false positives,
+				 * assume that any "-g<hex>" must have some
+				 * minimum number of <hex> that matches what
+				 * we'd produce when abbreviating.
+				 */
+				int min_len = default_abbrev;
+				if (min_len < 0)
+					min_len = FALLBACK_DEFAULT_ABBREV;
+
 				cp++;
 				len -= cp - name;
+				if (len < min_len)
+					return -1;
 				return get_short_oid(r,
 						     cp, len, oid, flags);
 			}
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 79e0f19deb..790afe40ac 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -707,4 +707,37 @@ test_expect_success 'describe --broken --dirty with a file with changed stat' '
 	)
 '
 
+test_expect_success 'long describe name can be resolved' '
+	name=$(git describe --long A) &&
+	git rev-parse "A^{commit}" >expect &&
+	git rev-parse "$name" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'resolving describe name does not depend on tag' '
+	hash=$(git rev-parse A^{commit}) &&
+	abbrev=$(echo $hash | cut -c1-30) &&
+	echo "$hash" >expect &&
+	git rev-parse "does-not-exist-g$abbrev" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'resolving describe name only valid at end' '
+	hash=$(git rev-parse A^{commit}) &&
+	abbrev=$(echo $hash | cut -c1-30) &&
+	test_must_fail git rev-parse "foo-g$abbrev-bar"
+'
+
+test_expect_success 'resolving describe name requires minimum abbrev (auto)' '
+	hash=$(git rev-parse A^{commit}) &&
+	abbrev=$(echo $hash | cut -c1-6) &&
+	test_must_fail git -c core.abbrev=auto rev-parse "foo-g$abbrev"
+'
+
+test_expect_success 'resolving describe name requires minimum abbrev (config)' '
+	hash=$(git rev-parse A^{commit}) &&
+	abbrev=$(echo $hash | cut -c1-20) &&
+	test_must_fail git -c core.abbrev=25 rev-parse "foo-g$abbrev"
+'
+
 test_done
-- 
2.46.0.452.ga6607598b6





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux