Re: [RFC/PATCH 0/7] Rework git core for native submodules

Ramkumar Ramachandra <artagnon@xxxxxxxxx> · Mon, 8 Apr 2013 21:38:12 +0530

Junio C Hamano wrote:
> Would it be possible that (at least some part of, or possibly all
> of) your ideas had some merit, but with all your hostility against
> the current system and the work that went behind it, you did not
> communicate well enough to make others understand you?

Agreed.  My annoyance with the current system did go a little
overboard, and I've been having a splitting headache for the last few
days.

> What I found very hard to read in this thread was that your messages
> all went like this:
>
>  1. In the current system, I have to be at the top level of a
>     submodule to work in it (or some other problems).
>
>  2. I will fix it in a more "elegant" way.
>
>  3. I have to have a new object at the submodule path, not the
>     current "submodule is a commit bound at the submodule path, and
>     information about the submodule is in .gitmodules".
>
> There was very little concrete explanation on how #3 leads to #2,
> i.e. the overall design of your new system and how it will work,
> other than you would read what we currently write in .gitmodules
> from a new kind of object.

I had no way of expressing what I wanted to do except by writing code
when I started off this thread, but am in much better shape now.  Let
me try to explain my fundamental assumptions and code in a concise way
now.

1. Having a toplevel .gitmodules means that any git-core command like
add/ rm/ mv will be burdened with looking for the .gitmodules at the
toplevel of the worktree and editing it appropriately along with
whatever it was built to do (ie. writing to the index and committing
it).  This is highly unnatural.  Putting the information in link
objects means that we get a more natural UI + warts like
cd-to-toplevel disappear with no extra code.

2. If we want to make git-submodule a part of git-core (which I think
everyone agrees with), we will need to make the information in
.gitmodules available more easily to the rest of git-core.  One way to
do it without breaking anything is to unpack the root tree, look for
an entry with the path .gitmodules and handle it different from other
blobs: ie. parse it into structured data that the rest of git-core can
consume.  However, I think it is very gross as the blob is not
inherently special in any way: it's just incidentally stored at a
specific tree path.  The alternative is to have an inherently special
kind of blob (ie. link object).  In the git-core code, I can simply
match for a link object and operate on it accordingly.  As opposed to
matching a blob object, and its tree path.  Moreover, this means that
the user can simply git edit-link <link> from anywhere in the worktree
instead of having to refer to the appropriate section in the toplevel
.gitmodules.

3. Currently diffing/ merging one huge .gitmodules file is a mess, as
it doesn't have to conform to a strict format.  This means that I can
get conflicts between these two:

    url = gh:artagnon/clayoven
    url =gh:artagnon/clayoven

Moreover, since the fields are not ordered, a simple reordering of the
fields will cause a merge conflict.  The correct way to fix this is to
split up .gitmodules into many logical files, have a git
edit-gitmodules which reduces user input to a strict format, and then
write custom diff/merge drivers.  My proposal involves having a git
edit-link, and teaching git-core to diff/merge appropriately.  The
information is already in logical bits.

4. The only seeming disadvantage of not having a file accessible via
the filesystem is that it doesn't behave like a full blob.  But it
does; the code to unstage a link object (emulation) is actually very
simple: I'm currently writing it.

5. Having a first-class link object comes with functional advantages.
It means that I can have a ref pointing to link objects and easily
initialize a nested repository without having to initialize the
containing repository (ie. essentially replacing repo). We can have
true floating submodules, which is really nice in my opinion: you can
fix a library at v3.1 and switch it to v3.2 at some point in the
future without using ugly SHA-1 hexes anywhere.

6. While it is possible to work top-down from the current system, that
approach is clearly taking too long and is too painful.  This explains
why submodules haven't come a long way in the last five years.  With
my approach, I'm trying to make life simpler for everyone: it will
suddenly become much easier to hack on submodules, and it can improve
more rapidly over the next five years.  I'm not thinking about
short-term fixes precisely for this reason: the long-term goal is
worth a little bit of short-term inconvenience.

7. I estimate that replacing the current submodule system without
feature regressions will not take a lot of effort and can be done with
minimal breakages.  It's not a lot of code or anything very complex.
We just have to follow along the lines of how git-core handles blobs,
and write a little bit of code to make links behave like blobs (I'm
halfway done with this already).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html