On 11/08/2011 12:34 PM, Felipe Contreras wrote: > Has anybody seen these? > http://draketo.de/proj/hg-vs-git-server/test-results.html#results > > Seems like a potential area of improvement. The fact that git requires periodic garbage collection is indeed annoying (even in interactive use) and even more annoying in the scenario discussed by the author of this article. With respect to the article's claims about the overall efficiency of Mercurial vs. git, I would like to point out that the author's use of a test repository with a linear history avoids one of Mercurial's big design weaknesses. If the repository had had a branching history, Mercurial's numbers would probably be significantly less flattering. Mercurial's revlog repository format [1] (at least the last time I checked) uses a single data file to hold the contents of all versions of a single file in the working copy. It appends a delta to the end of the revlog file for each revision, with periodic fulltexts. It is designed to make it possible to reconstruct any file revision via a single seek and a single read of at most twice the length of the file's fulltext (assuming that the index is already known). The avoidance of disk seeks goes a long way to explaining Mercurial's competitive performance despite the fact that it is written in Python. However, the deltas stored in revlog are not relative to a revision's parent(s), but rather relative to the previous revision in the revlog file, which is typically the most recent revision committed *to any branch*. Therefore, revlog is very good at storing a linear series of commits, but is considerably less efficient at storing a history with lots of branches that were under development concurrently. The net result is that the history of a branchy repository can take up much more space than that of a linear repository. There was a GSOC "parentdelta" project to allow deltas to be computed against parents [2], later replaced by a second "generaldelta" scheme [3], but AFAICT this is still experimental and they are struggling with its performance. There is also a script in contrib that reorders the revisions in a revlog file to put topological neighbors closer together [4]. This can shrink the size of the file dramatically. But of course this script is something like "git gc" in the sense that it would presumably need to be run periodically, and each run would have to lock the repo for some time. All this is not to detract from the fact that Mercurial, by not requiring garbage collection, has a big advantage against git in certain scenarios. Michael [1] http://mercurial.selenic.com/wiki/FAQ#FAQ.2BAC8-TechnicalDetails.How_does_Mercurial_store_its_data.3F [2] http://mercurial.selenic.com/wiki/ParentDeltaPlan [3] http://mercurial.selenic.com/wiki/WhatsNew#Mercurial_1.9_.282011-07-01.29 [4] http://selenic.com/hg/file/54c0517c0fe8/contrib/shrink-revlog.py -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html