On Mon, Mar 6, 2017 at 4:17 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > Linus Torvalds wrote: >> On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > >>> This document is still in flux but I thought it best to send it out >>> early to start getting feedback. >> >> This actually looks very reasonable if you can implement it cleanly >> enough. > > Thanks for the kind words on what had quite a few flaws still. Here's > a new draft. I think the next version will be a patch against > Documentation/technical/. FWIW, I like this approach. > Alongside the packfile, a sha3 repository stores a bidirectional > mapping between sha3 and sha1 object names. The mapping is generated > locally and can be verified using "git fsck". Object lookups use this > mapping to allow naming objects using either their sha1 and sha3 names > interchangeably. I saw some discussion about using LevelDB for this mapping table. I think any existing database may be overkill. For packs, you may be able to simplify by having only one file (pack-*.msha1) that maps SHA-1 to pack offset; idx v2. The CRC32 table in v2 is unnecessary, but you need the 64 bit offset support. SHA-1 to SHA-3: lookup SHA-1 in .msha1, reverse .idx, find offset to read the SHA-3. SHA-3 to SHA-1: lookup SHA-3 in .idx, and reverse the .msha1 file to translate offset to SHA-1. For loose objects, the loose object directories should have only O(4000) entries before auto gc is strongly encouraging packing/pruning. With 256 shards, each given directory has O(16) loose objects in it. When writing a SHA-3 loose object, Git could also append a line "$sha3 $sha1\n" to objects/${first_byte}/sha1, which GC/prune rewrites to remove entries. With O(16) objects in a directory, these files should only have O(16) entries in them. SHA-3 to SHA-1: open objects/${sha3_first_byte}/sha1 and scan until a match is found. SHA-1 to SHA-3: brute force read 256 files. Callers performing this mapping may load all 256 files into a table in memory.