Re: Questions about the hash function transition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 23 2018, Ævar Arnfjörð Bjarmason wrote:

>> Transition plan
>> ---------------
>
> One thing that's not covered in this document at all, which I feel is
> missing, is how we're going to handle references to old commit IDs in
> commit messages, bug trackers etc. once we go through the whole
> migration process.
>
> I.e. are users who expect to be able to read old history and "git show
> <sha1 I found>" expected to maintain a repository that has a live
> sha1<->sha256 mapping forever, or could we be smarter about this and
> support some sort of marker in the repository saying "maintain the
> mapping up until this point".
>
> Then, along with some v2 protocol extension to transfer such a
> historical mapping (and perhaps a default user option to request it)
> we'd be guaranteed to be able to read old log messages and "git show"
> them, and servers could avoid breaking past URLs without maintaining the
> mapping going forward.
>
> One example of this on the server is that on GitLab (I don't know how
> GitHub does this) when you reference a commit from e.g a bug, a
> refs/keep-around/<sha1> is created, to make sure it doesn't get GC'd.
>
> Those sorts of hosting providers would like to not break *existing*
> links, without needing to forever maintain a bidirectional mapping.

Considering this a bit more, I think this would nicely fall under what I
suggested in
https://public-inbox.org/git/874ll3yd75.fsf@xxxxxxxxxxxxxxxxxxx/

I.e. the interface that's now proposed / documented is fairly
inelastic. I.e.:

    [extensions]
        objectFormat = sha256
        compatObjectFormat = sha1

If we instead had something like clean/smudge filters:

    [extensions]
        objectFilter = sha256-to-sha1
        compatObjectFormat = sha1
    [objectFilter "sha256-to-sha1"]
        clean  = ...
        smudge = ...

We could apply arbitrary transformations on objects through filters
which would accept/return some simple format requesting them to
translate such-and-such objects, and would either return object
names/types under which to store them, or "nothing to do".

So we could also have filters that would munge the contents of objects
between local & remote (for e.g. this "use a public remote host for
storing an encrypted repo" that'll fsck on their end) use-case, but also
e.g. be able to pass arguments to the filters saying that only commits
older than so-and-so are to have a reverse mapping (for looking up old
commits), or just ones on some branch etc.

It wouldn't be any slower than the current proposal, since some subset
of it would be picked up and implemented in C directly via some fast
path, similar to the proposal that e.g. some encoding filters be
implemented as built-ins.

But by having it be more extendable it'll be easy to e.g. pass options,
or implement custom transformations.

We're still far away from reviewing patches to implement this, but in
anticipation of that I'd like to see what people think about
future-proofing this objectFilter syntax.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux