Re: [PATCH v2 1/3] Move init_skiplist() outside of fsck

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 18 Jan 2019 23:26:29 +0100

On Fri, Jan 18 2019, Jeff King wrote:

> On Fri, Jan 18, 2019 at 09:59:21PM +0100, Johannes Schindelin wrote:
>
>> By that reasoning all the preparatory work for switching to SHA-256 and
>> making the references in the Git code base less tied to SHA-1 would be
>> irrelevant now, "because we can cross that bridge when we reach it".
>>
>> You are suggesting to incur technical debt here. Let's be smarter about
>> this. We do not *have* to incur said technical debt. Nothing (except
>> mental laziness) makes use do that.
>>
>> Instead, we can make our load "when we reach that bridge" a lot lighter
>> by already doing the right thing.
>>
>> BTW I totally disagree that the skip list is bound to be SHA-1. It is
>> bound to be a list of object names, that's what its purpose is, and just
>> because we happen to not yet support other hash algorithms but SHA-1 does
>> not mean that the skip list is fixed to SHA-1. It'll always be whatever
>> hash algorithm is used in the current repository.
>
> Yeah, I agree with this. In particular, the code has already been
> modified to use "struct object_id" and parse_oid_hex(). So it is not
> even like somebody will have to come through later and fix the
> implementation here, and while they're at it change the "SHA-1" in the
> message. It has literally already been fixed, and is just waiting on
> parse_oid_hex() to learn about the new hashes behind the scenes.
>
> IMHO the conversion to object_id probably would have been the time to
> fix that message so we would not even have to be revisiting the
> discussion now. But that conversion was such a monumental pain it is
> hard to fault the authors for not picking up every scrap at that moment. ;)
>
> That is no excuse not to do it now, though.

I stand corrected, I thought these still needed to be updated to parse
anything that wasn't 40 chars, since I hadn't seen anything about these
formats in the hash transition document.

So fair enough, let's change that while we're at it, but this seems like
something that needs to be planned for in more detail / documented in
the hash transition doc.

I.e. many (e.g. me) maintain some system-wide skiplist for strict fsck
cloning of legacy repos. So I can see there being some need for a
SHA1<->SHA256 map in this case, but since these files might stretch
across repo boundaries and not be checked into the repo itself this is a
new use-case that needs thinking about.

But now that I think about it this sort of thing would be a good
use-case for just fixing these various historical fsck issues while
we're at it when possible, e.g. "missing space before email" (probably
not all could be unambiguously fixed). So instead of sha256<->sha1
fn(sha256)<->fn(sha1)[1]?

1. https://public-inbox.org/git/87ftyyedqd.fsf@xxxxxxxxxxxxxxxxxxx/