On Mon, 27 Oct 2008, Arne Babenhauserheide wrote: > Am Sonntag 26 Oktober 2008 19:55:09 schrieb Jakub Narebski: > > > > I agree, and I think it is at least partially because of Git having > > cleaner design, even if you have to understand more terms at first. > > What do you mean by "cleaner design"? Clean _underlying_ design. Git has very nice underlying model of graph (DAG) of commits (revisions), and branches and tags as pointers to this graph. > From what I see (and in my definition of "design"), Mercurial is designed as > VCS with very clear and clean design, which even keeps things like streaming > disk access in mind. I have read description of Mercurial's repository format, and it is not very clear in my opinion. File changesets, bound using manifest, bound using changerev / changelog. Mercurial relies on transactions and O_TRUNC support, while Git relies on atomic write and on updating data then updating reference to data. I don't quite understand comment about streaming disk access... > Also, looking at git, git users still have to garbage collect regularly, which > shows to me that the design wasn't really cleaner. Well, they have to a lot less than they used to, and there is "git gc --auto" that can be put in crontab safely. Explicit garbage collection was a design _decision_, not a sign of not clear design. We can argue if it was good or bad decision, but one should consider the following issues: * Rolling back last commit to correct it, or equivalently amending last commit (for example because we forgot some last minute change, or forgot to signoff a commit), or backing out of changes to the last commit in Mercurial relies on transactions (and locking) and correct O_TRUNC, while in Git it leaves dangling objects to be garbage collected later. * Mercurial relies on transaction support. Git relies on atomic write support and on the fact that objects are immutable; those that are not needed are garbage collected later. Beside IIRC some of ways of implementing transaction in databases leads to garbage collecting. * Explicit packing and having two repository "formats": loose and packed is a bit of historical reason: at the beginning there was only loose format. Pack format was IIRC invented for network transport, and was used for on disk storage (the same format!) for better I/O patterns[1]. Having packs as 'rewrite to pack' instead of 'append to pack' allows to prefer recency order, which result in faster access as objects from newer commits are earlier in delta chain and reduction in size in usual case of size growing with time as recency order allows to use delete deltas. Also _choosing_ base object allows further reduce size, especially in presence of nonlinear history. * From what I understand Mercurial by default uses packed format for branches and tags; Git uses "loose" format for recent branches (meaning one file per branch), while packing older references. Using loose affects performance (and size) only for insane number of references, and only for some operations like listing all references, while using packed format is IMHO a bit error prone when updating. * Git has reflogs which are pruned (expired) during garbage collecting to not grow them without bounds; AFAIK Mercurial doesn't have equivalent of this feature. (Reflogs store _local_ history of branch tip, noting commits, fetches, merges, rewinding branch, switching branches, etc._ [1] You wrote about "streaming disk access". Git relies (for reading) on good mmap implementation. > As an example: If I want some revision in hg, my repository just reads the > files in the store, jumps to the latest snapshots, adds the changes after > these and has the data. If you want to show some revision in Git, meaning commit message and diff in patch format (result of "git show"), Git just reads the commit, outputs commit message, reads parent, reads trees and performs diff. If you want to checkout to specific revision, Git just reads commit, reads tree, and writes this tree (via index) to working area. > In git is has to check all changesets which affect the file. I don't understand you here... if I understand correctly above, then you are wrong about Git. > If you read the hgbook, you'll find one especially nice comment: > > "Unlike many revision control systems, the concepts upon which Mercurial is > built are simple enough that it’s easy to understand how the software really > works. Knowing this certainly isn’t necessary, but I find it useful to have a > “mental model” of what’s going on." > - http://hgbook.red-bean.com/hgbookch4.html > > I really like that, and in my opinion it is a great compliment to hg, for two > reasons: > > 1) Hg is easy to understand Because it is simple... and less feature rich, c.f. multiple local branches in single repository. > 2) You don't have to understand it to use it You don't have to understand details of Git design (pack format, index, stages, refs,...) to use it either. > > And both are indications of a good design, the first of the core, the second > of the UI. Well, Git is built around concept of DAG of commits and branches as references to it. Without it you can use Git, but it is hard. But if you understand it, you can understand easily most advanced Git features. I agree that Mercurial UI is better; as usually in "Worse is Better" case... :-) -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html