Re: [PATCH 2/2] t6500: mark tests as SHA1 reliant

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 30 Jul 2017 14:21:50 -0700

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

> One approach I had considered taking is having a helper of some sort
> that wrapped a simple key/value store.  We could pass the wrapper the
> SHA-1 value (or, if necessary, an arbitrary key) and have it return the
> proper value based on the given hash function.
>
> That does have the downsides that the values may not present in the
> tests themselves, and that people adding new tests will of course need
> to run the test suite twice.  But it does make the tests easier to read.
>
> Opinions on the desirability of this approach are of course welcome.

I am not quite sure if I follow.  There was a proposal to tweak the
commit format that uses the new hash in such a way that we can tell
what SHA-1 would have been used if everything were SHA-1 (I think it
was from Jonathan, but I may be mistaken), and I recall that
generally the list were receptive to the idea.  But I have a feeling
that your "helper of some sort" is something else.

If your <key,value> is about letting us store something like

 - If you hash "hello\n" the resulting blob in SHA-1 world has this
   object name, and with that, you can find out the equivalent
   object name in SHA-256 world.

 - If you have a tree with the above blob at path P and nothing
   else, then the object name of that tree in the SHA-1 world and
   SHA-256 world are different and we can map between them.

 - Likewise for a commit that points at the above tree with fixed
   date, author and message.

I am not sure how much it would help.  Are you aiming to make it
easier and more structured to create a patch like what Stefan did
recently for t8008 in 0ba9c9a0 ("t8008: rely on rev-parse'd HEAD
instead of sha1 value", 2017-07-26)?

I also suspect that tests like t1512 and t6500 would not benefit
that much from such a mapping.  In these tests, the object names by
themselves are not interesting.  These tests are about what Git does
when the names of the objects involved in them happen to share a
certain prefix.  We are not interested in using the same payload in
these tests using different hash, which is likely to destroy the
aspect of the object names that these tests are interested in,
namely, they share the same prefix.  When updating these tests to
adjust for the SHA-256 world, we want to preserve that the resulting
object names happen to share the same prefix by tweaking the payload
strings (i.e. "263 and 410" in t6500 are chosen to cause the
resulting objects to share "17/" prefix and fall inside a same
fan-out directory as loose objects.  We want to choose different
strings so that the names of the resulting objects share the same
prefix, not necessarily "17/" but preferrably so, in the SHA-256
world.  Similarly, Random-looking strings like "a2onsxbvj" in t1512
are chosen to cause blobs, trees, commits and tags that are involved
in the test to all share the same prefix "000000..."; we want to
choose different set of such random-looking strings that cause all
objects involved to hash to the same prefix, not necessarily but
preferrably "000000...").