On 25 April 2018 at 01:51, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, Apr 24, 2018 at 11:50:16AM +0200, Martin Ågren wrote: >> On 24 April 2018 at 01:39, brian m. carlson >> <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: >> > The code for reading certain pack v2 offsets had a hard-coded 5 >> > representing the number of uint32_t words that we needed to skip over. >> > Specify this value in terms of a value from the_hash_algo. >> > >> > Signed-off-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> >> > --- >> > builtin/index-pack.c | 3 ++- >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> > >> > diff --git a/builtin/index-pack.c b/builtin/index-pack.c >> > index d81473e722..c1f94a7da6 100644 >> > --- a/builtin/index-pack.c >> > +++ b/builtin/index-pack.c >> > @@ -1543,12 +1543,13 @@ static void read_v2_anomalous_offsets(struct packed_git *p, >> > { >> > const uint32_t *idx1, *idx2; >> > uint32_t i; >> > + const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t); >> >> Should we round up? Or just what should we do if a length is not >> divisible by 4? (I am not aware of any such hash functions, but one >> could exist for all I know.) Another question is whether such an >> index-pack v2 will ever contain non-SHA-1 oids to begin with. I can't >> find anything suggesting that it could, but this is unfamiliar code to >> me. > > I opted not to simply because I know that our current hash is 20 bytes > and the new one will be 32, and I know those are both divisible by 4. I > feel confident that any future hash we choose will also be divisible by > 4, and the code is going to be complicated if it isn't. > > I agree that pack v2 is not going to have anything but SHA-1. However, > writing all the code such that it's algorithm agnostic means that we can > do testing of new algorithms by wholesale replacing the algorithm with a > new one, which simplifies things considerably. Ok. I do sort of wonder if a "successful" test run after globally substituting Hash-Foo for SHA-1 (regardless of whether the size changes or not) hints at a problem. That is, nowhere do we test that this code uses 20-byte SHA-1s, regardless of what other hash functions are available and configured. Of course, until soon, that did not really have to be tested since there was only one hash function available to choose from. As for identifying all the places that matter ... no idea. Of course I can see how this helps get things to a point where Git does not crash and burn because the hash has a different size, and where the test suite doesn't spew failures because the initial chaining value of "SHA-1" is changed. Once that is accomplished, I sort of suspect that this code will want to be updated to not always blindly use the_hash_algo, but to always work with SHA-1 sizes. Or rather, this would turn into more generic code to handle both "v2 with SHA-1" and "v3 with some hash function(s)". This commit might be a good first step in that direction. Long rambling short, yeah, I see your point. Martin