Re: newbie questions about git design and features (some wrt hg)

"Mike Coleman" <tutufan@xxxxxxxxx> · Tue, 30 Jan 2007 21:38:10 -0600

Thanks for all of your replies--this information is very helpful.
Though both hg and git look good, I will probably try git first,
partly because it seems the most interesting.  It feels like fertile
ground for experiments, and I suspect someone will think of some
surprising application for it.  (Also, I had the privilege of working
with Junio in a past life, and I consider his involvement a good
portent.)

This mercurial list post by Ted Tso was also useful:

   http://www.selenic.com/pipermail/mercurial/2007-January/012039.html

Regarding a Python (or other interpreted language) implementation, the
most obvious practical benefit would be an easy win32 port.  Not that
I'd ever choose to develop there, but it removes its lack as an
objection in some organizational settings (such as mine).  Someone
mentioned a Java port--that'd cover that base quite well.

As for performance, my thinking was that since hg is implemented
apparently almost entirely in Python, and has (again apparently)
generally acceptable performance, this suggested that much of the
problem might be I/O-bound enough that language efficiency might not
matter so much.

Aside: The program for which I'm considering trying git does mass spec
protein identification and has (in the general case) exponential
runtime, all of it CPU.  Run times on a 500-node cluster start at two
hours and go up rapidly.  You might think at first that this wouldn't
be a good candidate for Python, but so far this looks to be incorrect.
The simple reason: asymptotically, all of the run time happens in
about four functions.  Given that, and friendly constants, what was
about 15K (*) lines of C++ has turned into somewhat less than 1K lines
of C++ and 1K lines of Python--it's difficult to gauge because so many
new features have been added.  Somewhat ironically, the worst
performance issue seems to be C++'s obscure (to me) object
construction costs--I may end up just switching the C++ part to C.

There are many axes of design to be considered, of course, but the
moral I took away from that is that better than asking "Does this
program have to be really fast?", one should ask "How many lines of
this program could run 20x slower (than C) without significantly
affecting overall performance?"  If the answer is 80%, it might be
worth thinking about.  Skepticism is always in order, of course.

Mike

(*) via David Wheeler's sloccount
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html