On Sun, Aug 06 2017, Shawn Pearce jotted: > 5th iteration of the reftable storage format. I haven't kept up with all of the discussion, sorry if these comments repeat something that's already mentioned. > ### Version 1 > > A repository must set its `$GIT_DIR/config` to configure reftable: > > [core] > repositoryformatversion = 1 > [extensions] > reftable = true David Turner's LMDB proposal specified a extensions.refStorage config variable instead. I think this is a much better idea, cf. the mistake we already made with grep.extendedRegexp & grep.patternType. I.e. to have 'extensions.refStorage = reftable' instead of 'extensions.reftable = true'. If we grow another storage backend this'll become messy, and it won't be obvious to the user that the configuration is mutually exclusive (which it surely will be), so we'll end up having to special case it similar to the grep.[extendedRegexp,patternType] (i.e. either make one override the other, or make specifying >1 an error, a hassle with the config API). > Performance testing indicates reftable is faster for lookups (51% > faster, 11.2 usec vs. 5.4 usec), although reftable produces a > slightly larger file (+ ~3.2%, 28.3M vs 29.2M): > > format | size | seek cold | seek hot | > ---------:|-------:|----------:|----------:| > mh-alt | 28.3 M | 23.4 usec | 11.2 usec | > reftable | 29.2 M | 19.9 usec | 5.4 usec | > > [mh-alt]: https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@xxxxxxxxxxxxxx/ Might be worth noting "based on WIP Java implementation". I started searching for patches for this new format & found via <CAJo=hJtrdCOF-RxzXfyLx7R-1f2-7pZVO_UOg28J=wUDNdf3yw@xxxxxxxxxxxxxx> that it's JGit only. Also if one wanted to run these tests via JGit using your WIP code where does that code live / how to test it? > ### LMDB > > David Turner proposed [using LMDB][dt-lmdb], as LMDB is lightweight > (64k of runtime code) and GPL-compatible license. > > A downside of LMDB is its reliance on a single C implementation. This > makes embedding inside JGit (a popular reimplemenation of Git) > difficult, and hoisting onto virtual storage (for JGit DFS) virtually > impossible. This rationale as stated reads a bit too much like https://xkcd.com/927/ I.e. surely the actual problem isn't that there's a single C implementation of LMDB, since that's one more than the C implementation that exists of this new format already. Also isn't this info out of date now that this exists: https://github.com/lmdbjava/lmdbjava ? That project has been implemented after David's initial LMDB patches on-list, but I don't know if it implements the subset of the LMDB format needed for his proposed ref storage. But rather something like: A downside of LMDB is that it would be too complex to implement the subset of its database format needed for this reference storage in Java in the nascent lmdbjava project and to keep the two compatible going forward while juggling support for two upstream projects whose aims may conflict with ours. Or: A downside of LMDB is <above rationale> + even if we did that benchmarks <do we have those?> show that it wouldn't be worth it to use the LMDB format since it's slower/bigger/whatever. > A common format that can be supported by all major Git implementations > (git-core, JGit, libgit2) is strongly preferred. > > [dt-lmdb]: https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@xxxxxxxxxxxxxxxx/ > > ## Future > > ### Longer hashes > > Version will bump (e.g. 2) to indicate `value` uses a different > object id length other than 20. The length could be stored in an > expanded file header, or hardcoded as part of the version.