[PATCH v7 00/14] Serialized Git Commit Graph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This patch has only a few changes since v6:

* Fixed whitespace issues using 'git rebase --whitespace=fix'

* The --stdin-packs docs now refer to "pack-indexes" insead of "packs"

* Modified description of --object-dir option to warn use is rare

* Replaced '--additive' with '--append'

* In "commit-graph: close under reachability" I greatly simplified
  the check that every reachable commit is included. While running
  tests I noticed that the revision walk machinery could not keep up
  with a very large queue created when combined with the '--append'
  option that added all commits from the existing file as starting
  points for the walk. The new algorithm simply appends missing commits
  to the end of the list, which are then iterated to ensure their
  parents are in the list.

I have a few patch series prepared that provide further performance
improvments following this patch.

-- >8 --

This patch contains a way to serialize the commit graph.

The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base <A> <B>' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

To test this yourself, run the following on your repo:

  git config core.commitGraph true
  git show-ref -s | git commit-graph write --stdin-commits

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisons by toggling the 'core.commitGraph' setting.

[1] https://github.com/derrickstolee/git/pull/2
    A GitHub pull request containing the latest version of this patch.

Derrick Stolee (14):
  csum-file: rename hashclose() to finalize_hashfile()
  csum-file: refactor finalize_hashfile() method
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  commit-graph: implement write_commit_graph()
  commit-graph: implement git-commit-graph write
  commit-graph: implement git commit-graph read
  commit-graph: add core.commitGraph setting
  commit-graph: close under reachability
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: build graph from starting commits
  commit-graph: implement "--additive" option

 .gitignore                                    |   1 +
 Documentation/config.txt                      |   4 +
 Documentation/git-commit-graph.txt            |  94 +++
 .../technical/commit-graph-format.txt         |  97 +++
 Documentation/technical/commit-graph.txt      | 163 ++++
 Makefile                                      |   2 +
 alloc.c                                       |   1 +
 builtin.h                                     |   1 +
 builtin/commit-graph.c                        | 171 ++++
 builtin/index-pack.c                          |   2 +-
 builtin/pack-objects.c                        |   6 +-
 bulk-checkin.c                                |   4 +-
 cache.h                                       |   1 +
 command-list.txt                              |   1 +
 commit-graph.c                                | 738 ++++++++++++++++++
 commit-graph.h                                |  46 ++
 commit.c                                      |   3 +
 commit.h                                      |   3 +
 config.c                                      |   5 +
 contrib/completion/git-completion.bash        |   2 +
 csum-file.c                                   |  10 +-
 csum-file.h                                   |   9 +-
 environment.c                                 |   1 +
 fast-import.c                                 |   2 +-
 git.c                                         |   1 +
 pack-bitmap-write.c                           |   2 +-
 pack-write.c                                  |   5 +-
 packfile.c                                    |   4 +-
 packfile.h                                    |   2 +
 t/t5318-commit-graph.sh                       | 224 ++++++
 30 files changed, 1584 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 Documentation/technical/commit-graph-format.txt
 create mode 100644 Documentation/technical/commit-graph.txt
 create mode 100644 builtin/commit-graph.c
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h
 create mode 100755 t/t5318-commit-graph.sh


base-commit: 468165c1d8a442994a825f3684528361727cd8c0
-- 
2.17.0.14.gba1221a8ce




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux