On Fri, Dec 07, 2007 at 04:47:19PM -0800, Harvey Harrison wrote: > Some interesting stats from the highly packed gcc repo. The long chain > lengths very quickly tail off. Over 60% of the objects have a chain > length of 20 or less. If anyone wants the full list let me know. I > also have included a few other interesting points, the git default > depth of 50, my initial guess of 100 and every 10% in the cumulative > distribution from 60-100%. > > This shows the git default of 50 really isn't that bad, and after > about 100 it really starts to get sparse. Do you have a way to know which files have the longest chains? I have a suspiscion that the ChangeLog* files are among them, not only because they are, almost without exception, only modified by prepending text to the previous version (and a fairly small amount compared to the size of the file), and therefore the diff is simple (a single hunk) so that the limit on chain depth is probably what causes a new copy to be created. Besides that these files grow quite large and become some of the largest files in the tree, and at least one of them is changed for every commit. This leads again to many versions of fairly large files. If this guess is right, this implies that most of the size gains from longer chains comes from having less copies of the ChangeLog* files. From a performance point of view, it is rather favourable since the differences are simple. This would also explain why the window parameter has little effect. Regards, Gabriel - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html