On Fri, Feb 24, 2017 at 04:39:45PM -0800, Linus Torvalds wrote: > For example, what I would suggest the rules be is something like this: > > - introduce new tag2/commit2/tree2/blob2 object type tags that imply > that they were hashed using the new hash > > - an old type obviously can never contain a pointer to a new type (ie > you can't have a "tree" object that contains a tree2 object or a blob2 > object. > > - but also make the rule that a *new* type can never contain a > pointer to an old type, with the *very* specific exception that a > commit2 can have a parent that is of type "commit". Yeah, this is exactly what I had in mind. That way everybody in "newhash" mode has no decisions to make. They follow the same rules and it's as if sha1 never existed, except when you follow links in historical objects. > [in reply...] > Actually, I take that back. I think it might be easier to keep > "object->type" as-is, and it would only show the current OBJ_xyz > fields. Then writing the SHA ends up deciding whether a OBJ_COMMIT > gets written as "commit" or "commit2". Yeah, I think there are some data structures with limited bits for the "type" fields (e.g., the pack format). So sticking with OBJ_COMMIT might be nice. For commits and tags, it would be nice to have an "I'm v2" header at the start so there's no confusion about how they are meant to be interpreted. Trees are more difficult, as they don't have any such field. But a valid tree does need to start with a mode, so sticking some non-numeric flag at the front of the object would work (it breaks backwards compatibility, but that's kind of the point). I dunno. Maybe we do not need those markers at all, and could get by purely on object-length, or annotating the headers in some way (like "parent sha256:1234abcd"). It might just be nice if we could very easily identify objects as one type or the other without having to parse them in detail. > So you will end up with duplicate objects, and that's not good (think > of what it does to all our full-tree "diff" optimizations, for example > - you no longer get the "these sub-trees are identical" across a > format change), but realistically you'll have a very limited time of > that kind of duplication. Yeah, cross-flag-day diffs will be more expensive. I think that's something we have to live with. I was thinking originally that the sha1->newhash mapping might solve that, but it only works at the blob level. I.e., you can compare a sha1 and a newhash like: if (!hashcmp(sha1_to_newhash(a), b)) without having to look at the contents. But it doesn't work recursively, because the tree-pointing-to-newhash will have different content. -Peff