Re: git clone of empty repositories doesn't preserve hash

Jeff King <peff@xxxxxxxx> · Thu, 27 Apr 2023 00:46:33 -0400

On Wed, Apr 26, 2023 at 08:08:28AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > It sounds from your description that your test is running in a mode
> > where the client defaults to sha256 (though I'm not sure how, since we
> > explicitly document that GIT_DEFAULT_HASH should not affect clone), and
> > then you clone an empty sha256 repository via v0, expecting the result
> > to be sha256.
> 
> Thanks for coming up with an excellent guess that helped come up
> with a reproduction.
> 
> With Brian's patch a bit tweaked (attached below), the test does
> fail with the current 'master' and passes before the merge in
> question.  And the trace clearly shows that without being told
> anything about the object format via the capability, the client
> chooses to honor GIT_DEFAULT_HASH to initialize the new repository
> with sha256.

Ah, OK. The fact that we do respect GIT_DEFAULT_HASH there solves my
remaining confusion. And in retrospect it makes sense, because clone is
going to use the equivalent of "git init" under the hood. So we'll start
with _something_, and then adjust it based on what we read from the
other side. And that "something" respects GIT_DEFAULT_HASH.

Sort of a side note:

  This behavior is somewhat due to the fact that clone is implemented as
  "init, then fetch into the new repository". It could also conceptually
  be "talk to the remote, then init, then actually fetch". It never
  mattered before, but with options like object-format, the "init" step
  may be affected by things the remote said.

  So in our world, clone has to "fix up" parts of the repository that
  were initialized by tweaking the object-format config of the live
  repo. In a world where the very first thing it did was talk to the
  remote, then it would pass the option along to init in the first
  place.

  It's probably not worth trying to re-architect clone, though (not just
  in terms of work, but who knows what other subtle assumptions are
  baked into the current ordering). And it's not like it _solves_ this
  issue. It just might have made finding it a little less confusing.

> [Footnote]
> 
> * ... but I think it was misguided.  What it says there is this:
> 
>     And we also don't want to initialize the repository as SHA-1
>     initially, since that means if we're cloning an empty
>     repository, we'll have failed to honor the GIT_DEFAULT_HASH
>     variable and will end up with a SHA-1 repository, not a SHA-256
>     repository.
> 
> If this were "When we're cloning an empty repository, we'd have
> failed to honor the object format the other side has chosen and will
> end up with a SHA-1 repository, not a SHA-256 repository.", then it
> is very much in line with the reality before the patch under
> discussion and also in line with the official stance that "clone"
> should not honor GIT_DEFAULT_HASH.
> 
> Where the original description breaks down is when the other side is
> SHA-1 and this side has GIT_DEFAULT_HASH set to SHA-256.  If we
> honored the variable, we'd create a SHA-256 repository that will
> talk to SHA-1 repository before the rest of the system is ready.

Exactly. We are (and always have been) wrong 50% of the time in an empty
repo for v0. Your recent patch fixed v2 in an empty repo, but in doing
so flipped which halves of v0 were wrong.

-Peff