Google has published a series of patches (see links below) to JGit to improve fetch and clone performance by adding compressed bitmaps to the pack-*.idx structure. Operation Index V2 Index VE003 Clone 37530ms (524.06 MiB) 82ms (524.06 MiB) Fetch (1 commit back) 75ms 107ms Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB) Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB) Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB) Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB) Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB) In the table the repository tested was Android's platform/frameworks/base. The time shown is the time spent in the "Counting objects" phase of creating a pack for a client using the git:// protocol. The byte size shown is the size of the pack transferred to the client, and "commits back" describes how far behind the client was from the server when it started the fetch. In all test runs the client was git-core and the server was JGit on the same machine. The amount of disk space used by the compressed bitmaps is tunable, but averages 10-15% of the pack-*.idx file size. So about 8 MiB of additional space for this repository. A repository owner can reduce the worst case time used in the 100000 commit back case by using slightly more disk and positioning more bitmaps more frequently throughout history. The code doesn't do this by default because the expectation is that a client is probably not 100k commits behind. Instead it populates bitmaps at all branch and tag tips, and densely (every few hundred commits) near the tips, and spaces them out more the further back in history it goes. We assume the older history is accessed less often, and doesn't need to waste additional disk space or precious buffer cache. The basic gist of the implementation is a bitmap has a 1 bit set for each object that is reachable from the commit the bitmap is associated with. An index file may have a unique bitmap for hundreds of commits in the corresponding pack file. The set of objects to send is performed by doing a simple computation: OR (all want lines) AND NOT OR (all have lines) There are two key patches in the series that implement the file format change and logic involved: * https://git.eclipse.org/r/7939 Defines the new E003 index format and the bit set implementation logic. * https://git.eclipse.org/r/7940 Uses E003 indexes when available to make packs, and the logic required to make E003 format indexes during GC. :-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html