[PATCH 00/16] Speed up Counting Objects with bitmap data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello friends and enemies from the lovevely Git Mailing list.

I bring to you a patch series that implement a quite interesting performance
optimization: the removal of the "Counting Objects" phase during `pack-objects`
by using a pre-computed bitmap to find the reachable objects in the packfile.

As you probably know, Shawn Pearce designed this approach a few months ago and
implemented it for JGit, with very exciting results.

This is a not-so-straightforward port of his original design: The general approach
is the same, but unfortunately we were not able to re-use JGit's original on-disk
format for the `.bitmap` files.

There is a full technical spec for the new format (v2) in patch 09, including
benchmarks and rationale for the new design. The gist of it is that JGit's
original format is not `mmap`able (JGit tends to not mmap anything), and that
becomes very costly in practice with `upload-pack`, which spawns a new process
for every upload.

The header and metadata for both formats are however compatible, so it should be
trivial to update JGit to read/write this format too. I intend to do this on the
coming weeks, and I also hope that the v2 implementation will be slightly faster
than the actual, even with the shortcomings of the JVM.

The patch series, although massive, is rather straightforward.

Most of the patches are isolated refactorings that enable access to a few functions
that were previously hidden (re. packfile data). These functions are needed for
reading and writing the bitmap indexes.

Patch 03 is worth noting because it implements a performance optimization for
`pack-objects` which isn't particularly good in normal invocations (~10% speed up)
but that will show great benefits in later patches when it comes to writing the
bitmap indexes.

Patch 10 is the core of the series, implementing the actual loading of bitmap indexes
and optimizing the Counting Objects phase of `pack-objects`. Like with every other
patch that offers performance improvements, sample benchmarks are provided (spoiler:
they are pretty fucking cool).

Patch 11 and 16 are samples of using the new Bitmap traversal API to speed up other
parts of Git (`rev-list --objects` and `rev-list --count`, respectively).

Patch 12, 13 and 15 implement the actual writing of bitmap indexes. Like JGit, patch
12 enables writing a bitmap index as part of the `pack-objects` process (and hence
as part of a normal `gc` run). On top of that, I implemented a new plumbing command
in patch 15 that allows to write bitmap indexes for already-existing packfiles.

I'd love your feedback on the design and implementation of this feature. I deem it
rather stable, as we've been testing it on production on the world's largest Git
host (Git Hub Dot Com The Web Site) with good results, so I'd love it to have it
upstreamed on Core Git.

Strawberry kisses,
vmg

Jeff King (1):
  list-objects: mark tree as unparsed when we free its buffer

Vicent Marti (15):
  sha1_file: refactor into `find_pack_object_pos`
  pack-objects: use a faster hash table
  pack-objects: make `pack_name_hash` global
  revision: allow setting custom limiter function
  sha1_file: export `git_open_noatime`
  compat: add endinanness helpers
  ewah: compressed bitmap implementation
  documentation: add documentation for the bitmap format
  pack-objects: use bitmaps when packing objects
  rev-list: add bitmap mode to speed up lists
  pack-objects: implement bitmap writing
  repack: consider bitmaps when performing repacks
  sha1_file: implement `nth_packed_object_info`
  write-bitmap: implement new git command to write bitmaps
  rev-list: Optimize --count using bitmaps too

 Documentation/technical/bitmap-format.txt |  235 ++++++++
 Makefile                                  |   11 +
 builtin.h                                 |    1 +
 builtin/pack-objects.c                    |  362 +++++++-----
 builtin/pack-objects.h                    |   33 ++
 builtin/rev-list.c                        |   35 +-
 builtin/write-bitmap.c                    |  256 +++++++++
 cache.h                                   |    5 +
 ewah/bitmap.c                             |  229 ++++++++
 ewah/ewah_bitmap.c                        |  703 ++++++++++++++++++++++++
 ewah/ewah_io.c                            |  199 +++++++
 ewah/ewah_rlw.c                           |  124 +++++
 ewah/ewok.h                               |  194 +++++++
 ewah/ewok_rlw.h                           |  114 ++++
 git-compat-util.h                         |   28 +
 git-repack.sh                             |   10 +-
 git.c                                     |    1 +
 khash.h                                   |  329 +++++++++++
 list-objects.c                            |    1 +
 pack-bitmap-write.c                       |  520 ++++++++++++++++++
 pack-bitmap.c                             |  855 +++++++++++++++++++++++++++++
 pack-bitmap.h                             |   64 +++
 pack-write.c                              |    2 +
 revision.c                                |    5 +
 revision.h                                |    2 +
 sha1_file.c                               |   57 +-
 26 files changed, 4212 insertions(+), 163 deletions(-)
 create mode 100644 Documentation/technical/bitmap-format.txt
 create mode 100644 builtin/pack-objects.h
 create mode 100644 builtin/write-bitmap.c
 create mode 100644 ewah/bitmap.c
 create mode 100644 ewah/ewah_bitmap.c
 create mode 100644 ewah/ewah_io.c
 create mode 100644 ewah/ewah_rlw.c
 create mode 100644 ewah/ewok.h
 create mode 100644 ewah/ewok_rlw.h
 create mode 100644 khash.h
 create mode 100644 pack-bitmap-write.c
 create mode 100644 pack-bitmap.c
 create mode 100644 pack-bitmap.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]