This patch series introduces soft references (softrefs); a mechanism for declaring reachability between arbitrary (but existing) git objects. Softrefs are meant to provide the mechanism for "reverse mapping" that we determined was needed for tag objects (especially 'notes'). The patch series also teaches git-mktag to create softrefs for all tag objects. See the Discussion section in the git-softref manual page (patch #4/7) or the comments in the header file (patch #1/7) for more details on the design of softrefs. I've added some informal performance data at the bottom of this mail [1]. Note that this patch series is incomplete in that the following things have yet to be implemented: 1. Clone/fetch/push of softrefs 2. Packing of softrefs 3. General integration of softrefs into parts of git where they might be useful 4. Find appropriate value for MAX_UNSORTED_ENTRIES There are also some questions connected to the above list of todos: 1. Just how should softrefs affect reachability? Should softrefs be used/followed in _all_ reachability computations? If not, which? 2. How should softrefs propagate. I suggest they are pretty much always propagated under clone/fetch/push. (Note that the softrefs merge algorithm in softrefs.c removes duplicates and softrefs between non-existing objects, so pre-filtering of the softrefs to be clones/fetched/pushed may not be necessary) 3. Where can softrefs be used to improve performance by replacing existing techniques? 4. How to best pack softrefs? Keeping them in the same pack as the objects they refer to seems to be a good idea, but more thought needs to be put into this before we can make an implementation 5. How to find _all_ (even unreachable) tag objects in repo for 'git-softref --rebuild-tags'? 6. Optimization. Pretty much nothing has been done so far. Performance seems to be acceptable for now. Probably needs more testing to determine bottlenecks NOTE: After the 7 patches, I will send an _optional_ patch that changes the softrefs entries from text format (82 bytes per entry) to binary format (40 bytes per entry). The patch is optional, because I want the list to decide if we want the (marginal) speedup and simplified code provided by the patch, or if we want to keep the read-/maintainability of the text format. Currently I'm in favour of keeping the text format, but I'm far from sure. Finally, here's the shortlog: (This patch series of course goes on top of the previous "Refactor the tag object" patch series, although there isn't really that many dependencies between them): Johan Herland (7): Softrefs: Add softrefs header file with API documentation Softrefs: Add implementation of softrefs API Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs Softrefs: Add manual page documenting git-softref and softrefs subsystem in general Softrefs: Add testcases for basic softrefs behaviour Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin Teach git-mktag to register softrefs for all tag objects .gitignore | 1 + Documentation/cmd-list.perl | 7 +- Documentation/git-softref.txt | 119 +++++++ Makefile | 6 +- builtin-softref.c | 167 ++++++++++ builtin.h | 1 + git.c | 1 + mktag.c | 11 +- softrefs.c | 712 +++++++++++++++++++++++++++++++++++++++++ softrefs.h | 188 +++++++++++ t/t3050-softrefs.sh | 314 ++++++++++++++++++ 11 files changed, 1521 insertions(+), 6 deletions(-) create mode 100644 Documentation/git-softref.txt create mode 100644 builtin-softref.c create mode 100644 softrefs.c create mode 100644 softrefs.h create mode 100755 t/t3050-softrefs.sh Have fun! ...Johan [1] Informal performance measurements I prepared a linux kernel repo (holding 57274 commits) with 10 tag objects, and created softrefs from every commit to every tag object (572740 softrefs in total). The resulting softrefs db was 46964680 bytes. The experiment was done on a 32-bit Intel Pentium 4 (3 GHz w/HyperThreading) with 1 GB RAM: ======== Operations on unsorted softrefs: (572740 (10 per commit) entries in random/unsorted order) ======== Listing all softrefs (sequential reading of unsorted softrefs file) -------- $ /usr/bin/time git softref --list > /dev/null 0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+11786minor)pagefaults 0swaps Listing HEAD's softrefs (sequential reading of unsorted softrefs file) -------- $ /usr/bin/time git softref --list HEAD > /dev/null 0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+11790minor)pagefaults 0swaps Sorting softrefs -------- $ /usr/bin/time git softref --merge-unsorted 2.73user 4.97system 0:07.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+15833minor)pagefaults 0swaps Sorting softrefs into existing sorted file (throwing away duplicates) -------- $ /usr/bin/time git softref --merge-unsorted 3.49user 5.12system 0:08.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+27300minor)pagefaults 0swaps ======== Operations on sorted softrefs: (572740 (10 per commit) entries in sorted order) ======== Listing all softrefs (sequential reading of sorted softrefs file) -------- $ /usr/bin/time git softref --list > /dev/null 0.43user 0.02system 0:00.48elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+11786minor)pagefaults 0swaps Listing HEAD's softrefs (256-fanout followed by binary search in sorted softrefs file) -------- $/usr/bin/time git softref --list HEAD > /dev/null 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+334minor)pagefaults 0swaps Sorting softrefs (no-op) -------- $ /usr/bin/time git softref --merge-unsorted 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+312minor)pagefaults 0swaps -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html