Jonathan Nieder <jrnieder@xxxxxxxxx> writes: >>> +6. Skip fetching some submodules of a project into a NewHash >>> + repository. (This also depends on NewHash support in Git >>> + protocol.) >> >> It is unclear what this means. Around submodule support, one thing >> I can think of is that a NewHash tree in a superproject would record >> a gitlink that is a NewHash commit object name in it, therefore it >> cannot refer to an unconverted SHA-1 submodule repository. But it >> is unclear if the above description refers to the same issue, or >> something else. > > It refers to that issue. We may want to find a way to make it clear, then. >> It makes me wonder if we want to add the hashname in this object >> header. "length" would be different for non-blob objects anyway, >> and it is not "compat metadata" we want to avoid baked in, yet it >> would help diagnose a mistake of attempting to use a "mixed" objects >> in a single repository. Not a big issue, though. > > Do you mean that adding the hashname into the computation that > produces the object name would help in some use case? What I mean is that for SHA-1 objects we keep the object header to be "<type> <length> NUL". For objects in newer world, use the object header to "<type> <hash> <length> NUL", and include the hashname in the object name computation. > For loose objects, it would be nice to name the hash in the file, so > that "file" can understand what is happening if someone accidentally > mixes types using "cp". The only downside is losing the ability to > copy blobs (which have the same content despite being named using > different hashes) between repositories after determining their new > names. That doesn't seem like a strong downside --- it's pretty > harmless to include the hash type in loose object files, too. I think > I would prefer this to be a "magic number" instead of part of the > zlib-deflated payload, since this way "file" can discover it more > easily. Yeah, thanks for doing pros-and-cons for me ;-) >> If it is a goal to eventually be able to lose SHA-1 compatibility >> metadata from the objects, then we might want to remove SHA-1 based >> signature bits (e.g. PGP trailer in signed tag, gpgsig header in the >> commit object) from NewHash contents, and instead have them stored >> in a side "metadata" table, only to be used while converting back. >> I dunno if that is desirable. > > I don't consider that desirable. Agreed. Let's not go there. >> Hmm, as the corresponding packfile stores object data only in >> NewHash content format, it is somewhat curious that this table that >> stores CRC32 of the data appears in the "Tables for each object >> format" section, as they would be identical, no? Unless I am >> grossly misleading the spec, the checksum should either go outside >> the "Tables for each object format" section but still in .idx, or >> should be eliminated and become part of the packdata stream instead, >> perhaps? > > It's actually only present for the first object format. Will find a > better way to describe this. I see. One way to do so is to have it upfront before the "after this point, these tables repeat for each of the hashes" part of the file. >> Oy. So we can go from a short prefix to the pack location by first >> finding it via binsearch in the short-name table, realize that it is >> nth object in the object name order, and consulting this table. >> When we know the pack-order of an object, there is no direct way to >> go to its location (short of reversing the name-order-to-pack-order >> table)? > > An earlier version of the design also had a pack-order-to-pack-offset > table, but we weren't able to think of any cases where that would be > used without also looking up the object name that can be used to > verify the integrity of the inflated object. The primary thing I was interested in knowing was if we tried to think of any case where it may be useful and then didn't think of any---I couldn't but I know I am not imaginative enough, and I wanted to know you guys didn't, either.