Hi! Han-Wen Nienhuys wrote: > This adds the reftable library, and hooks it up as a ref backend. > > Feedback wanted: > > * spots marked with XXX in the Git source code. > > * what is a good test strategy? Could we have a CI flavor where we flip the > default to reftable, and see how it fares? [...] > Han-Wen Nienhuys (5): > refs.h: clarify reflog iteration order > create .git/refs in files-backend.c > refs: document how ref_iterator_advance_fn should handle symrefs > Add reftable library > Reftable support for git-core > > Jonathan Nieder (1): > reftable: file format documentation [...] > 51 files changed, 8547 insertions(+), 32 deletions(-) This series was discussed at the Git Contributor Summit, and I promised to provide a summary here. Sorry for the delay. Contribution format ~~~~~~~~~~~~~~~~~~~ This is a bit of an unusual contribution to Git: - the heart of the series is in a single commit - it is under a different license than most of Git - the code doesn't use the same dialect as and "look like" the rest of Git All three of these are because the code is meant to be a standalone library that can be used by Git, libgit2, and any other project that wants to understand reftables. But they create obstacles to reviewing (one reviewer said he had spent a two-hour block looking at the patch and not made much headway). On the other hand, people mostly agreed that having the ability to share this code with libgit2 is beneficial. So how can we get a substantial review without losing that benefit? 1. Most importantly, people would like a series of multiple patches that tell a story. If review of an early patch in the series reveals significant changes that should happen in reftable upstream, that's fine --- upstream, that can happen in a patch on top, whereas in the series being contributed, the patch would be amended. In the end, there would be two different series of commits: the contributed series, and the upstream history. The upstream history might be messier, and that's fine. The contributed series would be applied to git.git. If I understood correctly, this is more important for the initial contribution than for later changes. Git includes some code (e.g. in compat/) with outside upstreams and gets all-in-one-commit updates. As long as upstream has a functional review process, this is fine --- here, reftable upstream (you :)) wants review on the initial contribution and splitting into a series that tells a story is the cost of making that review feasible to provide. 2. Both Git and libgit2 have abstractions like strbuf. We'd like reftable to make use of similar abstractions where they make the code cleaner. 3. libgit2 has a git_malloc allocator. reftable doesn't necessarily have to use it or make it pluggable --- at worst, we can #define malloc to use it. But it's something to be aware of. Maintenance ~~~~~~~~~~~ There's some interest in the maintenance story: if we run into an issue with the reftable library, do we report and fix it on the git mailing list or does the reftable library have its own upstream forums for that? Portability ~~~~~~~~~~~ There was some confusion about the scope of what the reftable library provides. It provides a reader and a writer and the caller is responsible for connecting those to files. There were some questions about mmap usage here (there is none) and whether the library needs some kind of seeking reader interface for abstracting the OS interface to files. I think the upshot is 4. Some summary documentation would be helpful e.g. in the README.md, to point people to what header file a person should start with to understand the library 5. People are also interested in the file-oriented part of reftable, even though this library doesn't implement it (that's patch 6, which is Git-specific). Testing ~~~~~~~ Git's testsuite has various GIT_TEST_* knobs. A GIT_TEST_REF_BACKEND knob would be helpful, to allow running the full testsuite with reftable. That's a good way to suss out edge cases. The testsuite should *also* include some specific tests designed for reftable. They can demonstrate edge cases and demonstrate some of the benefits - case sensitivity - handling of directory/file conflicts That provides another entry point for people to understand reftable. We also discussed table-driven tests that can be shared between implementations but didn't end up saying anything too concrete about that. Summary ~~~~~~~ Breaking down the library into multiple patches is likely to be a signficant amount of work, but it's a good way to break the review into manageable pieces that get the project engaged. Integration testing is also likely to be helpful. People were very excited about the benefits reftable brings and it feels a little strange to ask so much when you've already written this code, but this seemed like the best way to make it into something the project understands well and to get any kinks ironed out. I'm happy to help in any way I can. Thoughts? Thanks, Jonathan