Using bitmaps to accelerate fetch and clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Google has published a series of patches (see links below) to JGit to
improve fetch and clone performance by adding compressed bitmaps to
the pack-*.idx structure.

Operation                   Index V2               Index VE003
Clone                       37530ms (524.06 MiB)     82ms (524.06 MiB)
Fetch (1 commit back)          75ms                 107ms
Fetch (10 commits back)       456ms (269.51 KiB)    341ms (265.19 KiB)
Fetch (100 commits back)      449ms (269.91 KiB)    337ms (267.28 KiB)
Fetch (1000 commits back)    2229ms ( 14.75 MiB)    189ms ( 14.42 MiB)
Fetch (10000 commits back)   2177ms ( 16.30 MiB)    254ms ( 15.88 MiB)
Fetch (100000 commits back) 14340ms (185.83 MiB)   1655ms (189.39 MiB)

In the table the repository tested was Android's
platform/frameworks/base. The time shown is the time spent in the
"Counting objects" phase of creating a pack for a client using the
git:// protocol. The byte size shown is the size of the pack
transferred to the client, and "commits back" describes how far behind
the client was from the server when it started the fetch. In all test
runs the client was git-core and the server was JGit on the same
machine.

The amount of disk space used by the compressed bitmaps is tunable,
but averages 10-15% of the pack-*.idx file size. So about 8 MiB of
additional space for this repository. A repository owner can reduce
the worst case time used in the 100000 commit back case by using
slightly more disk and positioning more bitmaps more frequently
throughout history. The code doesn't do this by default because the
expectation is that a client is probably not 100k commits behind.
Instead it populates bitmaps at all branch and tag tips, and densely
(every few hundred commits) near the tips, and spaces them out more
the further back in history it goes. We assume the older history is
accessed less often, and doesn't need to waste additional disk space
or precious buffer cache.

The basic gist of the implementation is a bitmap has a 1 bit set for
each object that is reachable from the commit the bitmap is associated
with. An index file may have a unique bitmap for hundreds of commits
in the corresponding pack file. The set of objects to send is
performed by doing a simple computation:

  OR (all want lines) AND NOT OR (all have lines)

There are two key patches in the series that implement the file format
change and logic involved:

* https://git.eclipse.org/r/7939

  Defines the new E003 index format and the bit set
  implementation logic.

* https://git.eclipse.org/r/7940

  Uses E003 indexes when available to make packs, and
  the logic required to make E003 format indexes during GC.

:-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]