Jonathan Nieder <jrnieder@xxxxxxxxx> writes: > +Reading an object's sha1-content > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > +The sha1-content of an object can be read by converting all newhash-names > +its newhash-content references to sha1-names using the translation table. Sure. > +Fetch > +~~~~~ > +Fetching from a SHA-1 based server requires translating between SHA-1 > +and NewHash based representations on the fly. > + > +SHA-1s named in the ref advertisement that are present on the client > +can be translated to NewHash and looked up as local objects using the > +translation table. > + > +Negotiation proceeds as today. Any "have"s generated locally are > +converted to SHA-1 before being sent to the server, and SHA-1s > +mentioned by the server are converted to NewHash when looking them up > +locally. Any of our alternate object store by definition is a NewHash repository--otherwise we'd violate "no mixing" rule. It may or may note have the translation table for its objects. If it no longer has the translation table (because it migrated to NewHash only world before we did), then we can still use it as our alternate but we cannot use it for the purpose of common ancestore discovery. > +After negotiation, the server sends a packfile containing the > +requested objects. s/objects.$/& These are all SHA-1 contents./ > +We convert the packfile to NewHash format using > +the following steps: > + > +1. index-pack: inflate each object in the packfile and compute its > + SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against > + objects the client has locally. These objects can be looked up > + using the translation table and their sha1-content read as > + described above to resolve the deltas. That procedure would give us the object's SHA-1 contents for ref-delta objects. For an ofs-delta object, by definition, its base object should appear in the same packstream, so we should eventually be able to get to the SHA-1 contents of the delta base, and from there we can apply the delta to obtain the SHA-1 contents. For a non-delta object, we already have its SHA-1 contents in the packstream. So we can get SHA-1 names and SHA-1 contents of each and every object in the packstream in this step. Are we actually writing out a .pack/.idx pair that is usable in the SHA-1 world at this stage? Or are we going to read from something we keep in-core in the step #3 below? > +2. topological sort: starting at the "want"s from the negotiation > + phase, walk through objects in the pack and emit a list of them, > + excluding blobs, in reverse topologically sorted order, with each > + object coming later in the list than all objects it references. > + (This list only contains objects reachable from the "wants". If the > + pack from the server contained additional extraneous objects, then > + they will be discarded.) Presumably this is a list of SHA-1 names, as we do not yet have enough information to compute NewHash names yet at this point. May want to spell it out here. Would it discard the auto-followed tags if we do the "traverse from wants only"? Traversing the objects in the packfile to find the "tips" that are not referenced from any other object in the pack might be necessary, and it shouldn't be too costly, I'd guess. > +3. convert to newhash: open a new (newhash) packfile. Read the topologically > + sorted list just generated. For each object, inflate its > + sha1-content, convert to newhash-content, and write it to the newhash > + pack. Record the new sha1<->newhash mapping entry for use in the idx. Are we doing any deltification here? If we are computing .pack/.idx pair that can be usable in the SHA-1 world in step #1, then reusing blob deltas should be trivial (a good delta-base in the SHA-1 world is a good delta-base in the NewHash world, too). Things that have outgoing references like trees, it might be possible that such a heuristic may not give us the absolute best delta-base, but I guess it would still be a good approximation to reuse the delta/base object relationship in SHA-1 world to NewHash world, assuming that the server did a good job choosing the bases. > +4. sort: reorder entries in the new pack to match the order of objects > + in the pack the server generated and include blobs. Write a newhash idx > + file OK. > +5. clean up: remove the SHA-1 based pack file, index, and > + topologically sorted list obtained from the server in steps 1 > + and 2. Ah, OK, so we do write the SHA_1 pack/idx in the first step. OK. > +Push > +~~~~ > +Push is simpler than fetch because the objects referenced by the > +pushed objects are already in the translation table. The sha1-content > +of each object being pushed can be read as described in the "Reading > +an object's sha1-content" section to generate the pack written by git > +send-pack. OK. > +Signed Commits > +~~~~~~~~~~~~~~ > +We add a new field "gpgsig-newhash" to the commit object format to allow > +signing commits without relying on SHA-1. It is similar to the > +existing "gpgsig" field. Its signed payload is the newhash-content of the > +commit object with any "gpgsig" and "gpgsig-newhash" fields removed. Do we prepare for newerhash, too? IOW, should the signed payload be the newhash-contents with any field whose name is "gpgsig" or begins with "gpgsig-" followed by anything? > +This means commits can be signed > +1. using SHA-1 only, as in existing signed commit objects > +2. using both SHA-1 and NewHash, by using both gpgsig-newhash and gpgsig > + fields. > +3. using only NewHash, by only using the gpgsig-newhash field. > + > +Old versions of "git verify-commit" can verify the gpgsig signature in > +cases (1) and (2) without modifications and view case (3) as an > +ordinary unsigned commit. For old clients to be able to verify (2), signed payload for SHA-1 is everything in SHA-1 contents minus "gpgsig"; "gpgsig-newhash" should not get excluded from the computation. Am I correct? I am primarily finding it a bit disturbing that there is a bit of asymmetry here. > +Signed Tags > +~~~~~~~~~~~ This message stops here for now.