Re: [PATCH 4/4] get_oid(): when an object was not found, try harder

Jeff King <peff@xxxxxxxx> · Wed, 13 Mar 2019 22:22:46 -0400

On Thu, Mar 14, 2019 at 10:29:49AM +0900, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@xxxxxxxxx>
> writes:
> 
> > @@ -442,6 +442,18 @@ static enum get_oid_result get_short_oid(const char *name, int len,
> >  	find_short_packed_object(&ds);
> >  	status = finish_object_disambiguation(&ds, oid);
> >  
> > +	/*
> > +	 * If we didn't find it, do the usual reprepare() slow-path,
> > +	 * since the object may have recently been added to the repository
> > +	 * or migrated from loose to packed.
> > +	 */
> > +	if (status == MISSING_OBJECT) {
> > +		reprepare_packed_git(the_repository);
> > +		find_short_object_filename(&ds);
> > +		find_short_packed_object(&ds);
> > +		status = finish_object_disambiguation(&ds, oid);
> > +	}
> > +
> 
> This looks obviously correct, but two things that made me wonder
> briefly were:
> 
>  1. is reprepare_packed_git() a bit too heavy-weight, if the only
>     thing we are addressing is the loose-object cache going stale?

It's not the only thing we are addressing. :)

Try this:

-- >8 --
mkfifo in
(git cat-file --batch-check <in) &
exec 9>in

git commit --allow-empty -m one
git commit --allow-empty -m two

git rev-parse --short HEAD^ >&9
git repack -adk
git rev-parse --short HEAD >&9
-- >8 --

The second object will (usually) be reported as "missing", even though
by calling reprepare_packed_git(), we would then find it in the
packfile.

I say "usually" because if the two commits have the same first-byte to
their sha1, then we'd load both into the loose cache during the first
request, and use that during the second request.

After this patch, it works consistently (we flush both the loose cache
and the cached set of packs, and find the packed version).

>  2. is there a way to cleanly avoid the three-line duplicate?

Yeah, as you noted, I think the boilerplate is worse than the
duplication. The most readable alternative to me is a separate function,
like:

diff --git a/sha1-name.c b/sha1-name.c
index cfe5c874b6..441666993b 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -411,6 +411,14 @@ static int sort_ambiguous(const void *a, const void *b)
 	return a_type_sort > b_type_sort ? 1 : -1;
 }
 
+static enum get_oid_result(struct disambiguate_state *ds,
+			   struct object_id *oid)
+{
+	find_short_object_filename(&ds);
+	find_short_packed_object(&ds);
+	return finish_object_disambiguation(&ds, oid);
+}
+
 static enum get_oid_result get_short_oid(const char *name, int len,
 					 struct object_id *oid,
 					 unsigned flags)
@@ -438,20 +446,16 @@ static enum get_oid_result get_short_oid(const char *name, int len,
 	else
 		ds.fn = default_disambiguate_hint;
 
-	find_short_object_filename(&ds);
-	find_short_packed_object(&ds);
-	status = finish_object_disambiguation(&ds, oid);
+	status = do_get_short_oid(&ds, oid);
 
 	/*
 	 * If we didn't find it, do the usual reprepare() slow-path,
 	 * since the object may have recently been added to the repository
 	 * or migrated from loose to packed.
 	 */
 	if (status == MISSING_OBJECT) {
 		reprepare_packed_git(the_repository);
-		find_short_object_filename(&ds);
-		find_short_packed_object(&ds);
-		status = finish_object_disambiguation(&ds, oid);
+		status = do_get_short_oid(&ds, oid);
 	}
 
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {

But what I find particularly ugly is not just that it's more lines, but
that the assumptions and outputs of do_get_short_oid() aren't
particularly clear.

-Peff