[PATCH 0/7] Introduce soft references (softrefs)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This patch series introduces soft references (softrefs); a mechanism for
declaring reachability between arbitrary (but existing) git objects.
Softrefs are meant to provide the mechanism for "reverse mapping" that
we determined was needed for tag objects (especially 'notes'). The patch
series also teaches git-mktag to create softrefs for all tag objects.

See the Discussion section in the git-softref manual page (patch #4/7) or
the comments in the header file (patch #1/7) for more details on the
design of softrefs.

I've added some informal performance data at the bottom of this mail [1].

Note that this patch series is incomplete in that the following things
have yet to be implemented:

1. Clone/fetch/push of softrefs

2. Packing of softrefs

3. General integration of softrefs into parts of git where they might be
   useful

4. Find appropriate value for MAX_UNSORTED_ENTRIES


There are also some questions connected to the above list of todos:

1. Just how should softrefs affect reachability? Should softrefs be
   used/followed in _all_ reachability computations? If not, which?

2. How should softrefs propagate. I suggest they are pretty much always
   propagated under clone/fetch/push. (Note that the softrefs merge
   algorithm in softrefs.c removes duplicates and softrefs between
   non-existing objects, so pre-filtering of the softrefs to be
   clones/fetched/pushed may not be necessary)

3. Where can softrefs be used to improve performance by replacing existing
   techniques?

4. How to best pack softrefs? Keeping them in the same pack as the objects
   they refer to seems to be a good idea, but more thought needs to be put
   into this before we can make an implementation

5. How to find _all_ (even unreachable) tag objects in repo for
   'git-softref --rebuild-tags'?

6. Optimization. Pretty much nothing has been done so far. Performance
   seems to be acceptable for now. Probably needs more testing to
   determine bottlenecks


NOTE: After the 7 patches, I will send an _optional_ patch
that changes the softrefs entries from text format (82 bytes per entry)
to binary format (40 bytes per entry). The patch is optional, because
I want the list to decide if we want the (marginal) speedup and
simplified code provided by the patch, or if we want to keep the
read-/maintainability of the text format. Currently I'm in favour of
keeping the text format, but I'm far from sure.


Finally, here's the shortlog: (This patch series of course goes on top of
the previous "Refactor the tag object" patch series, although there isn't
really that many dependencies between them):

Johan Herland (7):
      Softrefs: Add softrefs header file with API documentation
      Softrefs: Add implementation of softrefs API
      Softrefs: Add git-softref, a builtin command for adding, listing and administering softrefs
      Softrefs: Add manual page documenting git-softref and softrefs subsystem in general
      Softrefs: Add testcases for basic softrefs behaviour
      Softrefs: Administrivia associated with softrefs subsystem and git-softref builtin
      Teach git-mktag to register softrefs for all tag objects

 .gitignore                    |    1 +
 Documentation/cmd-list.perl   |    7 +-
 Documentation/git-softref.txt |  119 +++++++
 Makefile                      |    6 +-
 builtin-softref.c             |  167 ++++++++++
 builtin.h                     |    1 +
 git.c                         |    1 +
 mktag.c                       |   11 +-
 softrefs.c                    |  712 +++++++++++++++++++++++++++++++++++++++++
 softrefs.h                    |  188 +++++++++++
 t/t3050-softrefs.sh           |  314 ++++++++++++++++++
 11 files changed, 1521 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/git-softref.txt
 create mode 100644 builtin-softref.c
 create mode 100644 softrefs.c
 create mode 100644 softrefs.h
 create mode 100755 t/t3050-softrefs.sh


Have fun!

...Johan


[1] Informal performance measurements

I prepared a linux kernel repo (holding 57274 commits) with 10 tag objects,
and created softrefs from every commit to every tag object (572740 softrefs
in total). The resulting softrefs db was 46964680 bytes. The experiment was
done on a 32-bit Intel Pentium 4 (3 GHz w/HyperThreading) with 1 GB RAM:


========
Operations on unsorted softrefs:
(572740 (10 per commit) entries in random/unsorted order)
========

Listing all softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.44user 0.02system 0:00.47elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps

Listing HEAD's softrefs
(sequential reading of unsorted softrefs file)
--------
$ /usr/bin/time git softref --list HEAD > /dev/null
0.11user 0.01system 0:00.14elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11790minor)pagefaults 0swaps

Sorting softrefs
--------
$ /usr/bin/time git softref --merge-unsorted
2.73user 4.97system 0:07.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+15833minor)pagefaults 0swaps

Sorting softrefs into existing sorted file
(throwing away duplicates)
--------
$ /usr/bin/time git softref --merge-unsorted
3.49user 5.12system 0:08.64elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27300minor)pagefaults 0swaps


========
Operations on sorted softrefs:
(572740 (10 per commit) entries in sorted order)
========

Listing all softrefs
(sequential reading of sorted softrefs file)
--------
$ /usr/bin/time git softref --list > /dev/null
0.43user 0.02system 0:00.48elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+11786minor)pagefaults 0swaps

Listing HEAD's softrefs
(256-fanout followed by binary search in sorted softrefs file)
--------
$/usr/bin/time git softref --list HEAD > /dev/null
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+334minor)pagefaults 0swaps

Sorting softrefs
(no-op)
--------
$ /usr/bin/time git softref --merge-unsorted
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+312minor)pagefaults 0swaps


-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux