On Sat, Feb 25, 2017 at 02:26:56PM -0500, Jeff King wrote: > On Sat, Feb 25, 2017 at 06:50:50PM +0000, brian m. carlson wrote: > > > > As long as the reader can tell from the format of object names > > > stored in the "new object format" object from what era is being > > > referred to in some way [*1*], we can name new objects with only new > > > hash, I would think. "new refers only to new" that stratifies > > > objects into older and newer may make things simpler, but I am not > > > convinced yet that it would give our users a smooth enough > > > transition path (but I am open to be educated and pursuaded the > > > other way). > > > > I would simply use multihash[0] for this purpose. New-style objects > > serialize data in multihash format, so it's immediately obvious what > > hash we're referring to. That makes future transitions less > > problematic. > > > > [0] https://github.com/multiformats/multihash > > I looked at that earlier, because I think it's a reasonable idea for > future-proofing. The first byte is a "varint", but I couldn't find where > they defined that format. > > The closest I could find is: > > https://github.com/multiformats/unsigned-varint > > whose README says: > > This unsigned varint (VARiable INTeger) format is for the use in all > the multiformats. > > - We have not yet decided on a format yet. When we do, this readme > will be updated. > > - We have time. All multiformats are far from requiring this varint. > > which is not exactly confidence inspiring. They also put the length at > the front of the hash. That's probably convenient if you're parsing an > unknown set of hashes, but I'm not sure it's helpful inside Git objects. > And there's an incentive to minimize header data at the front of a hash, > because every byte is one more byte that every single hash will collide > over, and people will have to type when passing hashes to "git show", > etc. > > I'd almost rather use something _really_ verbose like > > sha256:1234abcd... > > in all of the objects. And then when we get an unadorned hash from the > user, we guess it's sha256 (or whatever), and fallback to treating it as > a sha1. > > Using a syntactically-obvious name like that also solves one other > problem: there are sha1 hashes whose first bytes will encode as a "this > is sha256" multihash, creating some ambiguity. Indeed, multihash only really is interesting when *all* hashes use it. And obviously, git can't change the existing sha1s. Mike