On Thu, Sep 27, 2012 at 7:47 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > Google has published a series of patches (see links below) to JGit to Should discussions about this series happen in here, jgit mailing or gerrit? I just want to make sure I'll discuss it at the right place. > improve fetch and clone performance by adding compressed bitmaps to > the pack-*.idx structure. > > Operation Index V2 Index VE003 > Clone 37530ms (524.06 MiB) 82ms (524.06 MiB) > Fetch (1 commit back) 75ms 107ms > Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB) > Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB) > Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB) > Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB) > Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB) Beautiful. And curious, why do 100->1000 and 10000->10000 have such big leaps in time (V2)? > The basic gist of the implementation is a bitmap has a 1 bit set for > each object that is reachable from the commit the bitmap is associated > with. An index file may have a unique bitmap for hundreds of commits > in the corresponding pack file. The set of objects to send is > performed by doing a simple computation: > > OR (all want lines) AND NOT OR (all have lines) > > There are two key patches in the series that implement the file format > change and logic involved: > > * https://git.eclipse.org/r/7939 > > Defines the new E003 index format and the bit set > implementation logic. I suppose the index format is not set in stone yet? My java-foo is rusty and I'm not familiar with jgit, so I more likely read things wrong. It seems the bitmap data follows directly after regular index content. I'd like to see some sort of extension mechanism like in $GIT_DIR/index, so that we don't have to increase pack index version often. What I have in mind is optional commit cache to speed up rev-list and merge, which could be stored in pack index too. In PackIndexVE003 class + // Read the bitmaps for the Git types + SimpleDataInput dataInput = new SimpleDataInput(fd); + this.commits = readBitmap(dataInput); + this.trees = readBitmap(dataInput); + this.blobs = readBitmap(dataInput); + this.tags = readBitmap(dataInput); Am I correct in saying that you have four different on-disk bitmaps, one for each object type? If so, for compression efficient reasons? > :-) Definitely :-). I have shown my interest in this topic before. So I should probably say that I'm going to work on this on C Git, but sllloooowwwly. As this benefits the server side greatly, perhaps a GitHubber ;-) might want to work on this on C Git, for GitHub itself of course, and, as a side effect, make the rest of us happy? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html