Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > For example, what I would suggest the rules be is something like this: > > - introduce new tag2/commit2/tree2/blob2 object type tags that imply > that they were hashed using the new hash > > - an old type obviously can never contain a pointer to a new type (ie > you can't have a "tree" object that contains a tree2 object or a blob2 > object. > > - but also make the rule that a *new* type can never contain a > pointer to an old type, with the *very* specific exception that a > commit2 can have a parent that is of type "commit". OK, I think that is what Peff was suggesting in his message, and I do not have problem with such a transition plan. Or the *very* specific exception could be that a reference to "commit" can use old name (which would allow binding a submodule before transition to a new project). We probably do not need "blob2" object as they do not embed any pointer to another thing. A loose blob with old name can be made available on the filesystem also under new name without much "heavy" transition, and an in-pack blob can be pointed at with _two_ entries in the updated pack index file under old and new names, both for the base (just deflated) representation and also ofs-delta. A ref-delta based on another blob with old name may need a bit of special handling, but the deltification would not be visible at the "struct object" layer, so probably not such a big deal. We may also be able to get away without "commit2" and "tag2" as their pointers can be widened and parse_{commit,tag}_object() should be able to deal with objects with new names transparently. "tree2" may be a bit tricky, though, but offhand it seems to me that nothing is insurmountable. > That way everything "converges" towards the new format: the only way > you can stay on the old format is if you only have old-format objects, > and once you have a new-format object all your objects are going to be > new format - except for the history. Yes. > So you will end up with duplicate objects, and that's not good (think > of what it does to all our full-tree "diff" optimizations, for example > - you no longer get the "these sub-trees are identical" across a > format change), but realistically you'll have a very limited time of > that kind of duplication. > > I'd furthermore suggest that from a UI standpoint, we'd > > - convert to 64-character hex numbers (32-byte hashes) > > - (as mentioned earlier) default to a 40-character abbreviation > > - make the old 40-character SHA1's just show up within the same > address space (so they'd also be encoded as 32-byte hashes, just with > the last 12 bytes zero). Yes to all of the above. > - you'd see in the "object->type" whether it's a new or old-style hash. I am not sure if this is needed. We may need to abstract tree_entry walker a little bit as a preparatory step, but I suspect that the hash (and more importantly the internal format) can be kept as an internal knowledge to the object layer (i.e. {commit,tree,tag}.c). So,... thanks for straightening me out. I was thinking we would need mixed mode support for smoother transition, but it now seems to me that the approach to stratify the history into old and new is workable.