Re: [RFC-PATCHv2] submodules: add a background story

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for dropping the ball here, I was stressed out a bit.

On Thu, Feb 9, 2017 at 3:32 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>   Do we need
>>
>>   * an opinionated way to check for a specific state of a submodule
>>   * (submodule helper to be plumbing?)
>>   * expose the design mistake of having the (name->path) mapping inside the
>>     working tree, i.e. never remove a name from the submodule config even when
>>     the submodule doesn't exist any more.
>
> I am not sure about the last item.
>
> Are you talking about a case where submodule comes and goes (think:
> "git checkout v1.0" that would make submodules added since that
> version disappar)?  .gitmodules that is checked out would not have
> any entry, but .git/config needs to record the end-user preference
> for the module, so that the user can do "git checkout -" to come
> back, no?

That is perfectly legit and I agree that is good design.

>  IOW .git/config that mentions all the submodule the user
> ever showed interests in is not a design mistake, so you must be
> talking about something else, but I do not know what it is.

I mean that we
(1) have a gitmodules file tracked in git that includes the name.
The "tracking some information inside the version control to
help the very version control system" is also not bad. The bad part
is that the name *must not be changed* and
 * we do not tell people about it in the docs
 * we happily make commits that change the name of a submodule
(2) name the submodule by path be default

See
https://public-inbox.org/git/7e54658a-dcb2-64a7-3c67-0c4fa221b2fb@xxxxxxxxx/

    > Oh, I see. You did not just rename the path, but also the name
    > in the .gitmodules?

    I wasn't even aware that the submodule name was something different from
    the path because the name is by default set to be the path to it.

You could blame this specific instance on the user, but I rather blame it on Git
as such questions come up once in a while on the mailing list.

If we were to redesign the .gitmodules file, we might have it as

    [submodule "path"]
        url = git://example.org
        branch = .
        ...

and the "path -> name/UID" mapping would be inside $GIT_DIR.

>
> Are they both in section (1)?  I think the former (concepts) belongs
> to section 7 and the latter (file formats) belongs to section 5.

oops. Will fix.

>
>> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
>> new file mode 100644
>> index 0000000000..3369d55ae9
>> --- /dev/null
>> +++ b/Documentation/gitsubmodules.txt
>> @@ -0,0 +1,194 @@
>> +gitsubmodules(7)
>> +================
>> +
>> +NAME
>> +----
>> +gitsubmodules - information about submodules
>> +
>> +SYNOPSIS
>> +--------
>> +$GIT_DIR/config, .gitmodules
>> +
>> +------------------
>> +git submodule
>> +------------------
>> +
>> +DESCRIPTION
>> +-----------
>> +
>> +A submodule allows you to keep another Git repository in a subdirectory
>> +...
>> +When cloning or pulling a repository containing submodules however,
>> +the submodules will not be checked out by default; You need to instruct
>> +'clone' to recurse into submodules. The 'init' and 'update' subcommands
>
> I think this is not "You need to", but rather "You can, if you want
> to have each and every submodules."

ok. In this  man page for submodules I assumed an implicit
"[if you want these submodules to be there, then] you have to/need to ...

But I'll tone it down as it doesn't carry internal assumptions.

>> +
>> +** When you want to use a (third party) library tied to a specific version.
>> +   Using submodules for a library allows you to have a clean history for
>> +   your own project and only updating the library in the submodule when needed.
>
> I somehow do not see this as decoupling; it is keeping what is
> originally separate separate, isn't it?

ok I'll reword that to say keeping separate things separate.

>
>> +** In its current form Git scales up poorly for very large repositories that
>> +   change a lot, as the history grows very large. For that you may want to look
>> +   at shallow clone, sparse checkout or git-lfs.
>> +   However you can also use submodules to e.g. hold large binary assets
>> +   and these repositories are then shallowly cloned such that you do not
>> +   have a large history locally.
>
> In other words, a better way to list these may be
>
>  1. using another project that stands on its own.
>
>  2. artificially split a (logically single) project into multiple
>     repositories and tying them back together.
>
> The access control and performance reasons are subclasses of 2.
> IOW, if Git had per-path ACL and infinite scaling, you wouldn't be
> splitting your project into submodules for 2.  You would still want
> to use somebody else's project by binding it as a subproject, instead
> of merging its history into yours.

Looking at the big picture with a logical view is better indeed.

>
>> +When working with submodules, you can think of them as in a state machine.
>> +So each submodule can be in a different state, the following indicators are used:
>> +
>> +* the existence of the setting of 'submodule.<name>.url' in the
>> +  superprojects configuration
>> +* the existence of the submodules working tree within the
>> +  working tree of the superproject
>> +* the existence of the submodules git directory within the superprojects
>> +  git directory at $GIT_DIR/modules/<name> or within the submodules working
>> +  tree
>> +
>> +      State      URL config        working tree     git dir
>> +      -----------------------------------------------------
>> +      uninitialized    no               no           no
>> +      initialized     yes               no           no
>> +      populated       yes              yes          yes
>> +      depopulated     yes               no          yes
>> +      deinitialized    no               no          yes
>> +      uninteresting    no              yes          yes
>> +
>> +      invalid          no              yes           no
>> +      invalid         yes              yes           no
>
> I do not have strong opinions on these labels; are submodule folks
> happy with the above vocabulary?

Brandon suggested (in)active instead of (un)initialized, which is better as
it decouples the current process from the actual states. Once we reintroduce
[1], then the user would not need to run "init" (whether it is 'git
submodule init'
or implicit as e.g. 'git submodule update --init') any more, but the selection
of active submodules would be done via config.

[1] https://public-inbox.org/git/20161110203428.30512-35-sbeller@xxxxxxxxxx/

>
> "uninteresting" is not explained in the below?

will fix.

>
>> ...
>> +SEE ALSO
>> +--------
>> +linkgit:git-submodule[1], linkgit:gitmodules[1].
>
> Ditto.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]