[TOPIC 2/8] State of SHA-256 transition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



# SHA-256 transition (brian)

- (brian) Functional version of "state four" implementation with only
	SHA-256 in the repository
- Interop work (to use sha1 and sha256) is mostly stalled, brian is
	mostly not working on it at the moment
- Current implementation is partially functional, though failing a lot
	of tests.  Can write SHA-256 objects into the repo, according to the
	transition, will write a loose mapping between SHA-1 and SHA-256,
	along with index v3 with the hashes for both
- When you index a pack, computes both hashes and stores them in the
	loose object store or pack
- Tricky part is when you're indexing a pack, you don't always get all
	blobs before all trees, before all commits, etc.
- In order to rewrite a commit from SHA-256 -> SHA-1, you need all
	reachable objects before in order to compute the hash. Try to look up
	in a temporary lookup table ahead of time, and lazily hash the object
	we're going to get and come back to it later.
- "Rewind the pack" to compute the proper objects, which works
- For submodules (currently unwritten), going to send both hashes over
	the wire, but unfortunately no way to validate those in real time. If
	your submodules are checked out, rewritten automatically.
- brian working on it slowly as they get to it, hopes that their
	employer will devote more time to it
- Wants to also work on libgit2 at the same time, since it doesn't yet
	understand SHA-256, though they hope that somebody else will work on
	it, since they are tired of writing SEGVs :-).
- (demetr): what if you have a remote that speaks only SHA-1?
	 - Goal is to have that information come over the pipe, and rewrite
		 into SHA-256 upon entering the new objects into the repository
- (demetr): can you then push a converted-into-SHA-256 repository back
	to a SHA-1 repo
	 - Goal is to be able to do that, unless you have a SHA-1 collision,
		 in which case it won't work.
	 - No major hosting platform yet supports only SHA-256 repositories,
		 though maybe Gitolite and CGit do
- (Peff): so, in the worst case, index-pack takes twice as long?
	 - brian: depends on how many are blob objects, since only takes a
		 single pass
	 - Will try to rewrite objects in as few passes as possible
	 - May need multiple passes in order to visit objects in topological
		 order
	 - Actually: worst case is N where N is the maximum tree depth
- (Stolee): what you really need is reverse-topo order on the object
	graph
	 - brian: yes, would be nice if the server sent them in that order.
		 But the server doesn't know how to do that.
- (Emily): so for something like shallow/partial-clone, the server needs
	to be able to do SHA-256 for you to compute it yourself?
	 - brian: there will be a capability, since data needs to come over
		 the pipe for submodules, and could be extended for shallow and
		 partial clones as well. Would fit into protocol v2, and will be
		 essential for submodules, so will have to exist regardless.
	 - Hopefully server has that information, though how that expensive
		 will be to compute is highly dependent.
- (jrn): submodules have to be updated, do you have an idea of what that
	protocol change will look like?
	 - brian: fuzzy idea, but nothing concrete yet
	 - (jrn): this reminds me of the early days of partial clones where we
		 talked about "promised" objects at the edge and associated metadata
- (Toon): so no interop, but is there a way to do a single step
	conversion from SHA-1 to SHA-256?
	 - brian: yes, you can use fast-export and fast-import. Currently any
		 signatures references are broken, but in the future would like to
		 update them (that code exists, but it hasn't been upstreamed)
	 - doesn't quite work with smoothly submodules, since you have to
		 rewrite them first, then generate a set of marks, and then export
		 and import
	 - verified with git/git, resulting index isn't substantially larger
		 (basically 32 bytes per object, along with slightly larger commit
		 and tree objects)
- (demetr): Could be significantly larger if you have a zillion commits
	 - brian: we'd have other problems before then :-).
- (Elijah): common in commit messages to refer back to earlier commits.
	Do we want to rewrite those?
	 - brian: maybe, depends on future plans if/when we deprecate earlier
		 hash algos
	 - (jrn): Don't have a good way to retroactively change commit
		 messages, but we do have git notes. First instinct is to use notes
		 for this kind of historical reference info
	 - (Terry): annotated tags?
	 - (Elijah): filter-repo does this kind of commit message munging



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux