# SHA-256 transition (brian) - (brian) Functional version of "state four" implementation with only SHA-256 in the repository - Interop work (to use sha1 and sha256) is mostly stalled, brian is mostly not working on it at the moment - Current implementation is partially functional, though failing a lot of tests. Can write SHA-256 objects into the repo, according to the transition, will write a loose mapping between SHA-1 and SHA-256, along with index v3 with the hashes for both - When you index a pack, computes both hashes and stores them in the loose object store or pack - Tricky part is when you're indexing a pack, you don't always get all blobs before all trees, before all commits, etc. - In order to rewrite a commit from SHA-256 -> SHA-1, you need all reachable objects before in order to compute the hash. Try to look up in a temporary lookup table ahead of time, and lazily hash the object we're going to get and come back to it later. - "Rewind the pack" to compute the proper objects, which works - For submodules (currently unwritten), going to send both hashes over the wire, but unfortunately no way to validate those in real time. If your submodules are checked out, rewritten automatically. - brian working on it slowly as they get to it, hopes that their employer will devote more time to it - Wants to also work on libgit2 at the same time, since it doesn't yet understand SHA-256, though they hope that somebody else will work on it, since they are tired of writing SEGVs :-). - (demetr): what if you have a remote that speaks only SHA-1? - Goal is to have that information come over the pipe, and rewrite into SHA-256 upon entering the new objects into the repository - (demetr): can you then push a converted-into-SHA-256 repository back to a SHA-1 repo - Goal is to be able to do that, unless you have a SHA-1 collision, in which case it won't work. - No major hosting platform yet supports only SHA-256 repositories, though maybe Gitolite and CGit do - (Peff): so, in the worst case, index-pack takes twice as long? - brian: depends on how many are blob objects, since only takes a single pass - Will try to rewrite objects in as few passes as possible - May need multiple passes in order to visit objects in topological order - Actually: worst case is N where N is the maximum tree depth - (Stolee): what you really need is reverse-topo order on the object graph - brian: yes, would be nice if the server sent them in that order. But the server doesn't know how to do that. - (Emily): so for something like shallow/partial-clone, the server needs to be able to do SHA-256 for you to compute it yourself? - brian: there will be a capability, since data needs to come over the pipe for submodules, and could be extended for shallow and partial clones as well. Would fit into protocol v2, and will be essential for submodules, so will have to exist regardless. - Hopefully server has that information, though how that expensive will be to compute is highly dependent. - (jrn): submodules have to be updated, do you have an idea of what that protocol change will look like? - brian: fuzzy idea, but nothing concrete yet - (jrn): this reminds me of the early days of partial clones where we talked about "promised" objects at the edge and associated metadata - (Toon): so no interop, but is there a way to do a single step conversion from SHA-1 to SHA-256? - brian: yes, you can use fast-export and fast-import. Currently any signatures references are broken, but in the future would like to update them (that code exists, but it hasn't been upstreamed) - doesn't quite work with smoothly submodules, since you have to rewrite them first, then generate a set of marks, and then export and import - verified with git/git, resulting index isn't substantially larger (basically 32 bytes per object, along with slightly larger commit and tree objects) - (demetr): Could be significantly larger if you have a zillion commits - brian: we'd have other problems before then :-). - (Elijah): common in commit messages to refer back to earlier commits. Do we want to rewrite those? - brian: maybe, depends on future plans if/when we deprecate earlier hash algos - (jrn): Don't have a good way to retroactively change commit messages, but we do have git notes. First instinct is to use notes for this kind of historical reference info - (Terry): annotated tags? - (Elijah): filter-repo does this kind of commit message munging