Re: [VOTE] git versus mercurial (for DragonflyBSD)

Jakub Narebski <jnareb@xxxxxxxxx> · Mon, 27 Oct 2008 02:52:22 +0100

On Mon, 27 Oct 2008, Arne Babenhauserheide wrote:
> Am Sonntag 26 Oktober 2008 19:55:09 schrieb Jakub Narebski:
> >
> > I agree, and I think it is at least partially because of Git having
> > cleaner design, even if you have to understand more terms at first.
> 
> What do you mean by "cleaner design"? 

Clean _underlying_ design. Git has very nice underlying model of graph
(DAG) of commits (revisions), and branches and tags as pointers to this
graph.

> From what I see (and in my definition of "design"), Mercurial is designed as 
> VCS with very clear and clean design, which even keeps things like streaming 
> disk access in mind. 

I have read description of Mercurial's repository format, and it is not
very clear in my opinion. File changesets, bound using manifest, bound
using changerev / changelog.

Mercurial relies on transactions and O_TRUNC support, while Git relies
on atomic write and on updating data then updating reference to data.

I don't quite understand comment about streaming disk access...

> Also, looking at git, git users still have to garbage collect regularly, which 
> shows to me that the design wasn't really cleaner. 

Well, they have to a lot less than they used to, and there is 
"git gc --auto" that can be put in crontab safely.

Explicit garbage collection was a design _decision_, not a sign of not
clear design. We can argue if it was good or bad decision, but one
should consider the following issues:

 * Rolling back last commit to correct it, or equivalently amending
   last commit (for example because we forgot some last minute change,
   or forgot to signoff a commit), or backing out of changes to the
   last commit in Mercurial relies on transactions (and locking) and
   correct O_TRUNC, while in Git it leaves dangling objects to be
   garbage collected later.

 * Mercurial relies on transaction support. Git relies on atomic write
   support and on the fact that objects are immutable; those that are
   not needed are garbage collected later. Beside IIRC some of ways of
   implementing transaction in databases leads to garbage collecting.

 * Explicit packing and having two repository "formats": loose and
   packed is a bit of historical reason: at the beginning there was
   only loose format. Pack format was IIRC invented for network
   transport, and was used for on disk storage (the same format!) for
   better I/O patterns[1]. Having packs as 'rewrite to pack' instead
   of 'append to pack' allows to prefer recency order, which result in
   faster access as objects from newer commits are earlier in delta
   chain and reduction in size in usual case of size growing with time
   as recency order allows to use delete deltas. Also _choosing_ base
   object allows further reduce size, especially in presence of
   nonlinear history.

 * From what I understand Mercurial by default uses packed format for
   branches and tags; Git uses "loose" format for recent branches
   (meaning one file per branch), while packing older references.
   Using loose affects performance (and size) only for insane number of
   references, and only for some operations like listing all references,
   while using packed format is IMHO a bit error prone when updating.

 * Git has reflogs which are pruned (expired) during garbage collecting
   to not grow them without bounds; AFAIK Mercurial doesn't have
   equivalent of this feature.

   (Reflogs store _local_ history of branch tip, noting commits, 
   fetches, merges, rewinding branch, switching branches, etc._

[1] You wrote about "streaming disk access". Git relies (for reading)
on good mmap implementation.

> As an example: If I want some revision in hg, my repository just reads the 
> files in the store, jumps to the latest snapshots, adds the changes after 
> these and has the data. 

If you want to show some revision in Git, meaning commit message and
diff in patch format (result of "git show"), Git just reads the commit,
outputs commit message, reads parent, reads trees and performs diff.

If you want to checkout to specific revision, Git just reads commit,
reads tree, and writes this tree (via index) to working area.

> In git is has to check all changesets which affect the file. 

I don't understand you here... if I understand correctly above,
then you are wrong about Git.

> If you read the hgbook, you'll find one especially nice comment: 
> 
> "Unlike many revision control systems, the concepts upon which Mercurial is 
> built are simple enough that it’s easy to understand how the software really 
> works. Knowing this certainly isn’t necessary, but I find it useful to have a 
> “mental model” of what’s going on."
> - http://hgbook.red-bean.com/hgbookch4.html
> 
> I really like that, and in my opinion it is a great compliment to hg, for two 
> reasons: 
> 
> 1) Hg is easy to understand

Because it is simple... and less feature rich, c.f. multiple local
branches in single repository.

> 2) You don't have to understand it to use it

You don't have to understand details of Git design (pack format, index,
stages, refs,...) to use it either.

> 
> And both are indications of a good design, the first of the core, the second 
> of the UI. 

Well, Git is built around concept of DAG of commits and branches as
references to it. Without it you can use Git, but it is hard. But
if you understand it, you can understand easily most advanced Git
features.

I agree that Mercurial UI is better; as usually in "Worse is Better"
case... :-)
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html