Re: Git commit generation numbers

Ramkumar Ramachandra <artagnon@xxxxxxxxx> · Tue, 6 Sep 2011 15:32:03 +0530

Hi,

First, let me start out by saying that I'm a fairly new contributor to
Git, and I'm far less experienced than the other people on this
thread.  I've read through all the discussions time and again, and
thought about the problem for some time now - I can't say I understand
it as fully as many of you do, but I think I may have a slightly
different perspective to offer.

In what way is Git fundamentally different from Subversion?  It's the
simplicity of the data model.  From the simplest building block, a
key-value store, we have been able to compose and build things on top
of it.  The reason we built centralized version control systems
earlier is because it was *easier* to address the composition
problems.  We dumped all related repository and problems into one
central server.  With so much information in one place, things are
tightly coupled and problems are easier to solve.  Still not
convinced?  What's the weakest component in Git today?  Undoubtedly
submodules.  Ofcourse, a large part of the reason is that many people
don't use submodules, and hence it doesn't improve -- but it's
actually a circular problem.  People don't use submodules, because
it's so featureless and hard to develop.  Why is it so hard?  Back to
the fundamental problem of composition from simple building blocks.
In submodules, we have to take entire DAGs and build a composite DAG.
The key pieces of information are deep inside Git's fundamnetals:
Gitlinks.  Other projects try like Gitslave try to attack the problem
on a more superficial level, but they all hit a barrier when they
discover that they can't compose big blocks of data: you need simple
building blocks to compose.

It's the same story with C (and now, Haskell).  Why does everyone like
C so much?  Because it only provides fundamental building blocks and
gives people the freedom to compose the way they like.  It doesn't
provide big "template blocks" like Java, because they tend to be
restrictive in the long run.  Sure, Java is easier to start out with,
but people soon realize that big blocks can't compose.

More than arguing about backward compatibility, and about how older
versions of Git commits won't have generation numbers, I think this is
what we should be focusing on.  Sure, it'll additionally make sense to
put in a cache to speed things up now, but we need to think about what
Git will be 10~15 years from now.  The fundamental pieces of
information required for composition must be present in the
fundamental building blocks.

The real question we should be asking is: "Should Git have had commit
generation numbers in 2005?".  If the answer is "yes", we should put
them in now before it becomes even harder, bending over backwards for
backward compatibility if necessary.  Otherwise, we'll regret this
decision 10~15 years later, when we're faced with deeper issues.  If
you want a concrete example, think about how you'd compose DAGs
together (again, the submodules problem): where is the information
required to prune each DAG and compose?

I wish I could write this in myself, but I'm afraid I don't have the
engineering skill yet.  I'll be happy to contribute whatever little I
can, and participate in the review process.

Thanks.

-- Ram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html