On Sun, Jan 30, 2011 at 00:05, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Shawn Pearce <spearce@xxxxxxxxxxx> writes: > >> Using this for object enumeration shaves almost 1 minute off server >> packing time; the clone dropped from 3m28s to 2m29s. That is close to >> what I was getting with the cached pack idea, but the network transfer >> stayed the small 376 MiB. > > I like this result. I'm really leaning towards putting this cached object list into JGit. I need to shave that 1 minute off server CPU time. I can afford the 41 MiB disk (and kernel buffer cache), but I cannot really continue to pay the 1 minute of CPU on each clone request for large repositories. The object list of what is reachable from commit X isn't ever going to change, and the path hash function is reasonably stable. With a version code in the file we can desupport old files if the path hash function changes. 10% more disk/kernel memory is cheap for some of my servers compared to 1 minute of CPU, and some explicit cache management by the server administrator to construct the file. > The amount of transfer being that small was something I didn't quite > expect, though. Doesn't it indicate that our pathname based object > clustering heuristics is not as effective as we hoped? I'm not sure I follow your question. I think the problem here is old side branches that got recently merged. Their _best_ delta base was some old revision, possibly close to where they branched off from. Using a newer version of the file for the delta base created a much larger delta. E.g. consider a file where in more recent revisions a function was completely rewritten. If you have to delta compress against that new version, but you use the older definition of the function, you need to use insert instructions for the entire content of that old function. But if you can delta compress against the version you branched from (or one much closer to it in time), your delta would be very small as that function is handled by the smaller copy instruction. Our clustering heuristics work fine. Our thin-pack selection of potential delta base candidates is not. We are not very aggressive in loading the delta base window with potential candidates, which means we miss some really good compression opportunities. Ooooh. I think my test was flawed. I injected the cached pack's tip as the edge for the new stuff to delta compress against. I should have injected all of the merge bases between the cached pack's tip and the new stuff. Although the cached pack tip is one of the merge bases, its not all of them. If we inject all of the merge bases, we can find the revision that this old side branch is based on, and possibly get a better delta candidate for it. IIRC, upload-pack would have walked backwards further and found the merge base for that side branch, and it would have been part of the delta base candidates. I think I need to re-do my cached pack test. Good thing I have history of my source code saved in this fancy revision control thingy called "git". :-) -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html