Re: Avery Pennarun's git-subtree?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 21, 2010 at 6:46 PM, Jens Lehmann <Jens.Lehmann@xxxxxx> wrote:
> Am 21.07.2010 23:09, schrieb Avery Pennarun:
>> What we *really* want is a way to have git actually recurse through
>> commit objects when doing *any* operation, as if they were tree
>> objects.
>
> This would not be useful for every work flow (or to put it in other
> words: this is not what I *really* want ;-). And as you pointed
> out, that only works when you have a single repo you are working
> against (like you do in your subtree model).

But you see, the utter failure of the way git-submodule works is that
it required a change to the git repository format, but that repository
format change resulted in absolutely *zero* improvement.

The tree object of the parent points at 'commit xxxx'.  But everything
in git has been *specially modified* to *just ignore* that 'commit
xxxx'.  It would have given exactly the same functionality - and much
less confusingly - if .gitmodules would just include the desired
commitid of the child project.  You could still have the same 'git
submodule' command with the same syntax and semantics.  And it
wouldn't have bastardized the git repo format.

It would have been just as good to just dump something into your
Makefile to go 'git clone' the subprojects from somewhere before
building.  Seriously, it would be one or two lines of code; all of
git-submodule replaces about one or two lines of code in your
Makefile.  And you know what?  If I just used that one or two lines of
code, I'd have all sorts of flexibility in where the subprojects get
cloned from, which I currently don't have, and which is the insanity
that drove me to write git-subtree in the first place.

HOWEVER

I'm not saying we can change that now.  I'm not suggesting that this
feature can be safely removed or changed at all.  Furthermore, I
totally agree that having large subprojects *not* be in your repo is
sometimes a good idea.  I just think it was actually a bad idea to
intrusively add support to git to implement this when it could have
been done without modifying git at all.

I also believe that the vast majority of people who use git-submodules
would rather have it work differently.  (Again, this is not to
subtract functionality.  The existing functionality is useful
sometimes.)

> But unless I got something wrong (which might very well be the
> case, as I never have used subtree myself), all changes to the
> subtree will only show up in that single repo, unless you actively
> push them somewhere else. And that, it seems to me, is as easy to
> forget as you can right now forget to push a submodules commit you
> already recorded and pushed in the superproject). So am I wrong
> assuming that subtree is more focused on a single repo containing
> all commits which /might/ then be shared, while submodules are
> about /always/ sharing code via their own repo?

Yes, this is absolutely intentional.  It also matches exactly with
everything else in the git repo philosophy!

I make my own clone.  I mess with it, I fiddle with it, I make 17
clones on my local machine, I throw away what I don't like, I pull
merge, I rebase, and then *eventually* I submit *some* of my patches
upstream.  git-subtree lets you do all those things.  git-submodule
stomps on you repeatedly if you try.

To wit:

- cloning a local supermodule on my local machine to another copy:
every call to 'git submodule update' re-downloads submodule repos from
the remote machine, because the submodule path is hardcoded to point
at a remote machine.  Better still, if I've modified any of my
subprojects without pushing changes upstream, the clone will fail,
because the new copy of the superproject will have no access to my
subproject's patches.  (If .gitmodules supplies a relative path, it's
even worse, because my 'origin' in the new copy is now pointing to a
local folder, not a remote one, and all the submodules don't exist
there.)

- branching a local supermodule on my local machine: fails to branch
the submodule automatically and makes it super easy to lose patches
altogether (since by default, they're committed to a detached HEAD).

- pulling/merging: always causes a conflict if local and remote have
modified the same submodule.

- rebasing: always causes a conflict if local and remote have modified
the same submodule.  Also requires you to rebase submodules separately
from the supermodule.  (Yes, this happens often in real life.)

- submitting upstream: requires me to have a separate repo that's a
copy of the upstream repo, and to manage at least one subrepo branch
for every superproject branch, just to track my submissions.  With
git-subtree, no extra repos are necessary.

It's very clear that git-submodule's current behaviour totally
mismatches the entire git philosophy.  That's why it's so impossible
to make the git-submodule command usable.

Another mental exercise: try to think of any other part of git where
it would be considered remotely acceptable to put the absolute or
relative URL of one repo inside another repo.  git URLs are an
implementation detail of clone/fetch/push/pull.  The *content* that
git manages should not have to deal with that stuff.  With
git-submodule, it has to.  With git-subtree, it doesn't.

>> There is no good solution to the submodule problem if each submodule
>> has to go in its own repo.  I've been thinking about this for years
>> now, and watching lots of discussions about it on the git mailing
>> list, and I just can't see any other option.  All the submodules have
>> to get pushed to and fetched from the same repo by default.  Anything
>> else is insane.
>
> I have to object here. Your insanity is someone else's work flow ;-)

Sorry.  I was being a little hyperbolic.  Some people might want to do
use multiple repos for certain things - but I believe those people are
much more rare than the kind who want to do it my way.  And
furthermore, even those people would probably actually like it better
if *most* of their subprojects - the smallish ones - could be all in
one repo.

Even if you like multiple repos, I'm sure you don't like being
*forced* to manually fork multiple repos just to fork a single
superproject.  I'm sure you don't like updating .gitmodules to change
the absolute URL of a submodule, and then getting merge conflicts when
someone else had to do the same thing.  There's no way you like that.
If you like that, then you really are insane. :)

> And I am the last one not to admit that there are some severe
> usability warts still to be fixed for submodules (I put up a - not
> necessarily complete - list at
> http://wiki.github.com/jlehmann/git-submod-enhancements/ ). And
> myself and others are actively working on them (the next bigger
> thing after a new config option about when to consider a submodule
> modified are recursive checkouts, so that "git submodule update"
> will hopefully be almost obsolete in the near future).

I don't believe you can fix git-submodule by fixing surface warts.
It's fundamentally broken.  Since we're stuck with supporting the
current behaviour at the end of time, fixing the surface warts might
be necessary and even mildly helpful.  It will also be soul sucking
since no matter how hard you try, people will still hate the result.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]