Re: [PATCH v4 3/4] submodule: support running in multiple worktree setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 26, 2016 at 10:20 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
> On Tue, Jul 26, 2016 at 1:25 AM, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
>> So what is the design philosophy in worktrees? How much independence does
>> one working tree have?
>
> git-worktree started out as an alternative for git-stash: hmm.. i need
> to make some changes in another branch, okay let's leave this worktree
> (with all its messy stuff) as-is, create another worktree, make those
> changes, then delete the worktree and go back here. There's already
> another way of doing that without git-stash: you clone the repo, fix
> your stuff, push back and delete the new repo.
>
> I know I have not really answered your questions. But I think it gives
> an idea what are the typical use cases for multiple worktrees. How
> much independence would need to be decided case-by-case, I think.

Thanks!


>
>> So here is what I did:
>>  *  s/git submodule init/git submodule update --init/
>>  * added a test_pause to the last test on the last line
>>  * Then:
>>
>> $ find . |grep da5e6058
>> ./addtest/.git/modules/submod/objects/08/da5e6058267d6be703ae058d173ce38ed53066
>> ./addtest/.git/worktrees/super-elsewhere/modules/submod/objects/08/da5e6058267d6be703ae058d173ce38ed53066
>> ./addtest/.git/worktrees/super-elsewhere/modules/submod2/objects/08/da5e6058267d6be703ae058d173ce38ed53066
>> ./.git/objects/08/da5e6058267d6be703ae058d173ce38ed53066
>>
>> The last entry is the "upstream" for the addtest clone, so that is fine.
>> However inside the ./addtest/ (and its worktrees, which currently are
>> embedded in there?) we only want to have one object store for a given
>> submodule?
>
> How to store stuff in .git is the implementation details that the user
> does not care about.

They do unfortunately. :(
Some teams here are trying to migrate from the repo[1] tool to submodules,
and they usually have large code bases. (e.g. The Android Open Source
Project[2], put into a superproject has a .git dir size of 17G. The
17G are partitioned as follows:

.../.git$ du --max-depth=1 -h
    44K ./hooks
    32K ./refs
    36K ./logs
    17G ./modules
    4.0K ./branches
    8.0K ./info
    4.7M ./objects
    17G .

i.e. roughly all in submodules.

So our users do care about both what is on disk, as well
as what goes over the wire (network traffic).

My sudden interest in worktrees came up when I learned the
`--reference` flag for submodule operations is broken for
our use case, and instead of fixing the `--reference` flag,
I think the worktree approach is generally saner (i.e. with the
references you may have nasty gc issues IIUC, but in the
worktree world gc knows about all the working trees, detached
heads and branches.)

[1] https://source.android.com/source/developing.html
[2] https://android.googlesource.com/

> As long as we keep the behavior the same (they
> can still "git submodule init" and stuff in the new worktree), sharing
> the same object store makes sense (pros: lower disk consumption, cons:
> none).

So I think the current workflow for submodules
may need some redesign anyway as the submodule
commands were designed with a strict "one working
tree only" assumption.

Submodule URLs  are stored in 3 places:
 A) In the .gitmodules file of the superproject
 B) In the option submodule.<name>.URL in the superproject
 C) In the remote.origin.URL in the submodule

A) is a recommendation from the superproject to make life
of downstream easier to find and setup the whole thing.
You can ignore that if you want, though generally a caring
upstream provides good URLs here.

C) is where we actually fetch from (and hope it has all
the sha1s that are recorded as gitlinks in the superproejct)

B) seems like a hack to enable the workflow as below:

Current workflow for handling submodule URLs:
 1) Clone the superproject
 2) Run git submodule init on desired submodules
 3) Inspect .git/config to see if any submodule URL needs adaption
 4) Run git submodule update to obtain the submodules from
    the configured place
 5) In case of superproject adapting the URL
    -> git submodule sync, which overwrites the submodule.<name>.URL in the
    superprojects .git/config as well as configuring the
remote."$remote".url in the submodule
 6) In case of users desire to change the URL
    -> No one command to solve it; possible workaround: edit
    .gitmodules and git submodule sync, or configure  the submodule.<name>.URL
    in the superprojects .git/config as well as configuring the
remote."$remote".url in
    the submodule separately. Although just changing the submodules remote works
    just as well (until you remove and re-clone the submodule)

One could imagine another workflow:
 1) clone the superproject, which creates empty repositories for the
    submodules
 (2) from the prior workflow is gone
 3) instead of inspecting .git/config you can directly manipulate the
    remote.$remote.url configuration in the submodule.
 4) Run git submodule update to obtain the submodules from
    the configured place

The current workflow is setup that way because historically you had
the submodules .git dir inside the submodule, which would be gone
if you deleted a submodule. So if you later checkout an earlier version'
that had a submodule, you are missing the objects and more importantly
configuration where to get them from.

This is now fixed by keeping the actual submodules git dir inside
the superprojects git dir.


>
>
>> After playing with this series a bit more, I actually like the UI as it is an
>> easy mental model "submodules behave completely independent".
>>
>> However in 3/4 you said:
>>
>> + - `submodule.*` in current state should not be shared because the
>> +   information is tied to a particular version of .gitmodules in a
>> +   working directory.
>>
>> This is already a problem with say different branches/versions.
>> That has been solved by duplicating that information to .git/config
>> as a required step. (I don't like that approach, as it is super confusing
>> IMHO)
>
> Hmm.. I didn't realize this. But then I have never given much thought
> about submodules, probably because I have an alternative solution for
> it (or some of its use cases) anyway :)

What is that?

>
> OK so it's already a problem. But if we keep sharing submodule stuff
> in .git/config, there's a _new_ problem: when you "submodule init" a
> worktree, .git/config is now tailored for the current worktree, when
> you move back to the previous worktree, you need to "submodule init"
> again.

"Moving back" sounds like you use the worktree feature for short lived
things only. (e.g. in the man page you refer to the hot fix your boss wants
you to make urgently).

I thought the worktree feature is more useful for long term "branches",
e.g. I have one worktree of git now that tracks origin/master so I can
use that to "make install" to repair my local broken version of git.

(I could have a worktree "continuous integration", where I only run tests
in. I could have one worktree for Documentation changes only.)

This long lived stuff probably doesn't make sense for the a single
repository, but in combination with submodules (which is another way
to approach the "sparse/narrow" desire of a large project), I think
that makes sense, because the "continuous integration" shares a lot
of submodules with my "regular everyday hacking" or the "I need to
test my colleague work now" worktree.

> So moving to multiple worktrees setup changes how the user uses
> submodule, not good in my opinion.

Because the submodule user API is built on the strong assumption
of "one working tree only", we have to at least slightly adapt.

So instead of cloning a submodule in a worktree we could just
setup a submodule worktree as well there?
(i.e. that saves us network as well as disk)

>
> If you have a grand plan to make submodule work at switching branches
> (without reinit) and if it happens to work the same way when we have
> multiple worktrees, great.

Eh, I am still working on the master plan. ;)
The insights on how worktrees handles stuff helps me shape it though. :)

If you switch a branch (or to any sha1), the submodule currently stays
"as-is" and may be updated using "submodule update", which goes through
the list of existing (checked out) submodules and checks them out to the
sha1 pointed to by the superprojects gitlink.

>
>> I am back to the drawing board for the submodule side of things,
>> but I guess this series could be used once we figure out how to
>> have just one object database for a submodule.
>
> I would leave this out for now. Let's make submodule work with
> multiple worktrees first (and see how the users react to this). Then
> we can try to share object database. Object database and refs are tied
> closely together so you may run into other problems soon.

I see. The normal for submodules is to be in detached HEAD though.

The user can of course checkout branches or things in there, but
the "submodule update" operations do not go to a branch for you.


----
Another (slightly offtopic) observation on the similarity of worktree
and submodules: There is no good way implemented to remove one.

For submodules there is deinit both removes the working tree as well
as the configuration indicating the existence (Note: the git dir still exists
for the submodule). Though that sounds like what we need to save us
network traffic the next time we need the submodule. Although going
through the code I need to test that a bit more later today to see how
fail safe it is.
On the submodule side, it often gets confusing what you want to remove
(local checkout of the submodule, or the gitlink or both).

For worktrees there is no "worktree rm" as it would probably promise
a bit more than the man pages suggestion of rm -rf $worktree && git
worktree prune.

Thanks,
Stefan

> --
> Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]