Re: Avery Pennarun's git-subtree?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 21, 2010 at 21:09, Avery Pennarun <apenwarr@xxxxxxxxx> wrote:
> On Wed, Jul 21, 2010 at 4:36 PM, Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
>> On Wed, Jul 21, 2010 at 19:56, Avery Pennarun <apenwarr@xxxxxxxxx> wrote:
>>> No amount of bugfixing in git submodule can fix this workflow, because
>>> it's not a result of bugs.  (The bugs, particularly the
>>> disconnected-by-default HEADs on submodule checkouts, do make it a bit
>>> worse :( )  It would require a fundamental redesign to make this work
>>> nicely with submodules.
>> [...]
>> I think most of those can be fixed, actually. The only requirement
>> that the git plumbing imposes on git-submodules is that a "commit"
>> entry exist in your tree, the rest is just (ugly plumbing).
>
> Sure.  But this commit object (and the objects it points to) are never
> automatically pushed, fetched, or fsck'd.  They're second class
> citizens.  As it turns out, this was a major design mistake in
> implementing the submodule commit objects.
>
> All the behaviour people *currently* get from submodules could have
> been obtained without using a new 'commit' object type at all.  Just
> add a commitid to the horrible junk (including repo URLs, argh) that
> already needs to get pasted into .gitmodules, and have git-commit at
> the top level update .gitmodules automatically (as it currently
> updates the 'commit' tree entries).  Problem solved (at least, solved
> to exactly the extent that it is today).

Yeah, that does sound better than the current mess.

> What we *really* want is a way to have git actually recurse through
> commit objects when doing *any* operation, as if they were tree
> objects.  If we had that, submodules could be beautiful (because you'd
> push them to the same repo, etc and users would see none of the
> complexity).  But this doesn't exist.  And for backward compatibility
> at this point, we'd probably need to introduce an entirely new kind of
> tree entry to support such a thing.
>
>> Thus, we could:
>>
>>   * Hack git-submodule (or its replacement) to check import the tree
>>     that contains that "commit" into one central .git
>
> This part is relatively easy, I think - at least in concept, although
> I bet there would be widespread implementation tweaks - and would
> clean up a lot of the mess.  However it would require a change to the
> .git/index file format to remember when a subdir is a commit and not a
> "normal" tree so that it doesn't silently commit the next thing as a
> tree instead.
>
>>   * Fix git status / git commit so that you could commit into
>>     submodules, i.e.:
>>
>>     for each submodule in this-commit:
>>         chdir $submodule && commit
>>     done && cd $root && commit -m"bumping submodules"
>
> After making the earlier change to get rid of the extra .git subdirs,
> this next requirement would actually be considerably more work,
> because 'git commit' would need to know how to update a subcommit
> without changing HEAD.  You certainly couldn't just code it up as a
> recursive "git commit" as you imply (and as you could do right now).
>
>>   * Make git-push push the submodule contents and the
>>     superprojects. You'd just need to have commit access to the url
>>     listed in .gitmodules.
>
> This is really a *killer* problem, and you're making it sound easy.
> Let's imagine that my app has 25 different submodules - not
> unreasonable at all in a world with dozens of ever-changing ruby gems
> and suchlike.
>
> Now, if I want to branch my project, I might have to branch 25
> projects just so I can push my changes?  It's totally awful.  And the
> awfulness is multiplied many times over if .gitmodules has hard-coded
> repo paths, because then I have to update the repo path in my branch
> but not the other branch, and merging will have conflicts.  You might
> think that my .git/config could just override .gitmodules, but then
> some guy trying to fetch my branch will fail to fetch the submodules
> from my branch and get errors and have no idea what's going on.
>
> And you might think that using relative repo paths in .gitmodules
> would work, but that's only if I branched all 25 submodules in the
> *first* place.  In real life, most subprojects point at the original
> project's home repo by default (because nobody thinks they'll be
> patching 25 subprojects when they start, and they're probably right),
> but then you have to individually change the URLs when you decide you
> need to patch them, and life gets complicated and ugly, especially
> when the next guy goes to fork your project and now needs to fork some
> subprojects but not others.
>
> There is no good solution to the submodule problem if each submodule
> has to go in its own repo.  I've been thinking about this for years
> now, and watching lots of discussions about it on the git mailing
> list, and I just can't see any other option.  All the submodules have
> to get pushed to and fetched from the same repo by default.  Anything
> else is insane.

Yeah, bundling the submodules in the upstream repo so only one person
ever has to worry about gathering them up and pushing them to the
central repo sounds better for most uses than the current submodule
implementation.

OTOH, I have some submodules that I track on GitHub that would really
inflate the size of the repo that's tracking them. So there are
definitely use cases for having the tree somewhere remotely as well,
especially for large submodules like game art, which some people have
reported submodules for.

> One option might be to store the submodule commit refs as refs in your
> superproject.  That wouldn't actually be so bad, except for the
> aforementioned problem that fetch/push/clone/etc don't actually trace
> through commit objects when deciding what objects to send you, so
> fetching the ref of the superproject wouldn't autofetch the subproject
> refs.  Also, you could accidentally delete one of the subproject refs
> and lose tons of history without ever realizing it.  That's error
> prone and confusing... and clutters up your repo refs list with
> administrative stuff you didn't actually want in the first place.
>
>> What's missing from that (which would be nice) is the ability to check
>> out a subdirectory from another repository. That could (I think) be
>> done by just adding a normal "tree" entry, and then specifying that
>> that tree can be found in git://... instead of the main tree.
>
> Actually that's already easy with submodules (and git-subtree makes it
> easy too, though slightly different).  Just fetch the commit from the
> other repo, and do:
>
>   git checkout FETCH_HEAD -- subdirname
>
>>> If we can get some kind of consensus in principle that git-subtree is
>>> a good idea to merge into git core, I can prepare some patches and we
>>> can talk about the details.
>>
>> From having looked at it briefly it looks very nice. But it looks to
>> me as if the main differences between git-submodule and git-subtree
>> are in the porcelain, not the plumbing.
>
> No.  The fundamental difference is exactly one: git-subtree uses
> normal 'tree' entries (rather than commits) in its trees, so that all
> the git tools recurse through them like any other tree.  Thus you
> don't need any extra refs, extra .git dirs, etc.  That allows you to
> bypass all the useless behaviour git has around 'commit' entries.
> This is very much a plumbing difference.
>
> The git-submodule porcelain happens to independently be kind of
> annoying and inconvenient, but that would be much easier to fix if it
> weren't for the plumbing-related problems.
>
>> It would be a lot less confusing to users of Git in the long term if
>> we would at least try to unify these two approaches instead of having
>> two mutually incompatible ways of doing essentially the same thing.
>
> True.  But I don't have the time, and implementing the new 'commit'
> entry semantics sounds like a lot of work (as opposed to arguing about
> them, which I guess I'm good at but which seems unproductive).
>
> In productive terms: git-subtree is solving problems for real users
> right now.  It might solve more problems for more users if it were
> integrated into the core and thus made "official."  Nothing precludes
> making submodules better later.

Sure, don't get me wrong. git-subtree looks very useful, and I have no
objection to having it in git.git, and even if it's not optimal for
everything good working software now shouldn't be held up by some
theoretical pie-in-the-sky system.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]