Re: [PATCH 18/41] index-pack: abstract away hash function constant

Martin Ågren <martin.agren@xxxxxxxxx> · Wed, 25 Apr 2018 20:49:47 +0200

On 25 April 2018 at 01:51, brian m. carlson
<sandals@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Apr 24, 2018 at 11:50:16AM +0200, Martin Ågren wrote:
>> On 24 April 2018 at 01:39, brian m. carlson
>> <sandals@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > The code for reading certain pack v2 offsets had a hard-coded 5
>> > representing the number of uint32_t words that we needed to skip over.
>> > Specify this value in terms of a value from the_hash_algo.
>> >
>> > Signed-off-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx>
>> > ---
>> >  builtin/index-pack.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/builtin/index-pack.c b/builtin/index-pack.c
>> > index d81473e722..c1f94a7da6 100644
>> > --- a/builtin/index-pack.c
>> > +++ b/builtin/index-pack.c
>> > @@ -1543,12 +1543,13 @@ static void read_v2_anomalous_offsets(struct packed_git *p,
>> >  {
>> >         const uint32_t *idx1, *idx2;
>> >         uint32_t i;
>> > +       const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
>>
>> Should we round up? Or just what should we do if a length is not
>> divisible by 4? (I am not aware of any such hash functions, but one
>> could exist for all I know.) Another question is whether such an
>> index-pack v2 will ever contain non-SHA-1 oids to begin with. I can't
>> find anything suggesting that it could, but this is unfamiliar code to
>> me.
>
> I opted not to simply because I know that our current hash is 20 bytes
> and the new one will be 32, and I know those are both divisible by 4.  I
> feel confident that any future hash we choose will also be divisible by
> 4, and the code is going to be complicated if it isn't.
>
> I agree that pack v2 is not going to have anything but SHA-1.  However,
> writing all the code such that it's algorithm agnostic means that we can
> do testing of new algorithms by wholesale replacing the algorithm with a
> new one, which simplifies things considerably.

Ok. I do sort of wonder if a "successful" test run after globally
substituting Hash-Foo for SHA-1 (regardless of whether the size changes
or not) hints at a problem. That is, nowhere do we test that this code
uses 20-byte SHA-1s, regardless of what other hash functions are
available and configured. Of course, until soon, that did not really
have to be tested since there was only one hash function available to
choose from. As for identifying all the places that matter ... no idea.

Of course I can see how this helps get things to a point where Git does
not crash and burn because the hash has a different size, and where the
test suite doesn't spew failures because the initial chaining value of
"SHA-1" is changed.

Once that is accomplished, I sort of suspect that this code will want to
be updated to not always blindly use the_hash_algo, but to always work
with SHA-1 sizes. Or rather, this would turn into more generic code to
handle both "v2 with SHA-1" and "v3 with some hash function(s)". This
commit might be a good first step in that direction.

Long rambling short, yeah, I see your point.

Martin