Re: question about: Facebook makes Mercurial faster than Git

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Mon, 10 Mar 2014 15:48:20 +0100

On 03/10/2014 01:10 PM, Johan Herland wrote:
> It should be possible to teach Git to do similar things, and IINM
> there are (and have previously been) several attempts to do similar
> things in Git, e.g.:
> 
>  - http://thread.gmane.org/gmane.comp.version-control.git/240339
> 
>  - http://thread.gmane.org/gmane.comp.version-control.git/217817
> 
> I haven't looked closely at these attempts (it is not my scratch to
> itch), and I don't know if/how they would work on top of Watchman, but
> in principle I don't see why Git shouldn't be able to leverage
> Watchman the same way Mercurial does.

This touches on the most important thing that we should take to heart
from this episode:

Of course Facebook could have modified either Git or Mercurial to do
what they want.  Why did they pick Mercurial?  The article seems to
claim that they were initially biased towards Git, but they chose
Mercurial because its code base is easier to modify.  This is a claim
that I can easily believe.

The two projects are almost exactly the same age.  The number of commits
in the two projects is similar.  Mercurial has had fewer contributors
active at any given time over its project lifetime.

But let's see how much code is in the main part of Mercurial vs. Git:

    $ find mercurial hgext \( -name '*.c' -o -name '*.py' \) -print |
          xargs cat | wc -l
    46164

    $ cat *.c *.h *.sh *.perl builtin/*.c | wc -l
    188530

These are just crude estimates and I hope I got the right directories
for Mercurial.  But, by these numbers, Git has 4 times as much code as
Mercurial.  That alone will go a long way to making Git harder to
modify.  I don't think that Git has anywhere near 4 times the features
of Mercurial.  Probably most of the difference can be explained by the
choice of implementation languages; 94% of the code in these hg
directories is Python, whereas 88% of Git's core code is C.

How can we make Git easier to hack (short of switching languages)?  Here
are my suggestions:

* Better function docstrings -- don't make developers have to read the
whole call stack to find out what a function does, or who owns the
memory that is passed around.

* More modularity -- more coherent and abstract APIs between different
parts of the system, and less pawing around in your neighbor's data
structures.

* Higher-level abstractions -- make more use of APIs like strbuf and
string_list as opposed to handling every malloc() and realloc() by hand.

I personally wish that we as a project would be more willing to spend a
few extra CPU microseconds to make our code easier to read and modify
and more robust.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html