Re: git clone of empty repositories doesn't preserve hash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> It sounds from your description that your test is running in a mode
> where the client defaults to sha256 (though I'm not sure how, since we
> explicitly document that GIT_DEFAULT_HASH should not affect clone), and
> then you clone an empty sha256 repository via v0, expecting the result
> to be sha256.

Thanks for coming up with an excellent guess that helped come up
with a reproduction.

With Brian's patch a bit tweaked (attached below), the test does
fail with the current 'master' and passes before the merge in
question.  And the trace clearly shows that without being told
anything about the object format via the capability, the client
chooses to honor GIT_DEFAULT_HASH to initialize the new repository
with sha256.

There is this entry in the release notes of 2.29.0:

 * "git clone" that clones from SHA-1 repository, while
   GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an
   unusable repository that half-claims to be SHA-256 repository
   with SHA-1 objects and refs.  This has been corrected.

This refers to 47ac9703 (builtin/clone: avoid failure with
GIT_DEFAULT_HASH, 2020-09-20).  It seems that the "fix" described
there was incomplete and did not address "clone a void" case, and
the patch under discussion finished the incomplete fix without
knowing by accident.  The test that was added by 47ac9703 to t5601
does use non-empty repositories (one with SHA-1, the other with
SHA-256) and tries to check the interaction with "clone" with the
environment variable.  With the patch under discussion, this test
did not break and "clone" is still ignoring the GIT_DEFAULT_HASH
variable.

Also the description in the document of GIT_DEFAULT_HASH makes
readers ambivalent:

`GIT_DEFAULT_HASH`::
	If this variable is set, the default hash algorithm for new
	repositories will be set to this value. This value is currently
	ignored when cloning; the setting of the remote repository
	is used instead. The default is "sha1". THIS VARIABLE IS
	EXPERIMENTAL! See `--object-format` in linkgit:git-init[1].

To me, "is currently ignored" hints that the author, or somebody
close to the author, of this entry feels that it is a bug that
"clone" does not honor it.  The log message of 47ac9703 also phrases
"failed to honor the GIT_DEFAULT_HASH" as if it were a bad thing [*].

On the other hand, it is clearly documented as experimental so
anything that has been relying on the behaviour of any command with
a particular value set to this variable are waiting to be broken.

I am torn about this.  Even though we may have been very clear that
GIT_DEFAULT_HASH should not kick in in this case, "clone" was buggy
(in the sense that it did not behave as documented in an empty
repository) and whatever thing that changed behaviour that Brian
noticed was ignoring the documentation and taking advantage of that
"bug", and Hyrum's law ensues.

In the longer term, after/when we allow incremental/over-the-wire
migration of object-format, i.e. clone from SHA-1 repository to
create SHA-256 repository (or vice versa) and fetching and pushing
between them would bidirectionally convert the object format on the
fly, it is likely we would teach a new option "--object-format" to
"git clone" to say "you would use whatever object format the origin
uses by default, but this time, I am telling you to use this format
on our side, doing on-the-fly object format conversion as needed".

So it would be OK to make sure that GIT_DEFAULT_HASH is ignored by
"clone" as documented now, and even after on-the-fly object-format
migration is implemented, I would think.


[Footnote]

* ... but I think it was misguided.  What it says there is this:

    And we also don't want to initialize the repository as SHA-1
    initially, since that means if we're cloning an empty
    repository, we'll have failed to honor the GIT_DEFAULT_HASH
    variable and will end up with a SHA-1 repository, not a SHA-256
    repository.

If this were "When we're cloning an empty repository, we'd have
failed to honor the object format the other side has chosen and will
end up with a SHA-1 repository, not a SHA-256 repository.", then it
is very much in line with the reality before the patch under
discussion and also in line with the official stance that "clone"
should not honor GIT_DEFAULT_HASH.

Where the original description breaks down is when the other side is
SHA-1 and this side has GIT_DEFAULT_HASH set to SHA-256.  If we
honored the variable, we'd create a SHA-256 repository that will
talk to SHA-1 repository before the rest of the system is ready.

 t/t5700-protocol-v1.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git i/t/t5700-protocol-v1.sh w/t/t5700-protocol-v1.sh
index 6c8d4c6cf1..2f39dd2e05 100755
--- i/t/t5700-protocol-v1.sh
+++ w/t/t5700-protocol-v1.sh
@@ -244,6 +244,19 @@ test_expect_success 'push with ssh:// using protocol v1' '
 	grep "push< version 1" log
 '
 
+test_expect_success 'clone propagates object-format from empty repo' '
+	test_when_finished "rm -fr src256 dst256" &&
+
+	echo sha256 >expect &&
+	git init --object-format=sha256 src256 &&
+	GIT_DEFAULT_HASH=sha256 \
+	GIT_TRACE_PACKET=1 \
+	git clone --no-local src256 dst256 &&
+	git -C dst256 rev-parse --show-object-format >actual &&
+
+	test_cmp expect actual
+'
+
 # Test protocol v1 with 'http://' transport
 #
 . "$TEST_DIRECTORY"/lib-httpd.sh



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux