Re: tracking submodules out of main directory.

Jens Lehmann <Jens.Lehmann@xxxxxx> · Mon, 01 Aug 2011 21:39:51 +0200

Am 30.07.2011 23:55, schrieb henri GEIST:
> Le samedi 30 juillet 2011 à 16:16 +0200, Jens Lehmann a écrit :
>> Am 29.07.2011 11:39, schrieb henri GEIST:
>>> Le jeudi 28 juillet 2011 à 18:48 +0200, Jens Lehmann a écrit :
>>>> Am 28.07.2011 10:57, schrieb henri GEIST:
>>>
>>> It is not a matter of disabling any control of git in its own
>>> repository.
>>> It is just a matter of adding inside the git repository a reference
>>> (dependency) to an other git repository.
>>
>> ... which you want to have *outside* of the containing repository!
> 
> yes
> 
>> That will then be registered in other git repositories too in your model,
>> which gets rid of the "one file/submodule, one repo" assumption we now have
>> and will introduce ambiguities which are *really* hard to handle.
> 
> I am sorry, I am not a native English speaker. This sentence is to
> complex for me. And google translator is of no help in this case.

Your proposal of letting multiple gitlinks in different repos point to the
same submodule will break the assumption that each file is only handled by
a single git repo. For example when you have a conflict and do a "git
submodule update --recursive" in the superproject, the SHA1 used for "lib"
will depend on the alphabetical order of "project1" and "project2". And
normally after running "git submodule update --recursive" you expect all
submodules of the superproject to be clean. But your change breaks this
expectation, it will still contain unclean submodule entries even though
you just told git it should clean them. What will a "git submodule sync
--recursive" do when "project1" and "project2" use different urls in their
.gitmodules? And so on.

Commands won't always behave like you expect them to and sometimes will
give different results just because different names are used. That's what
I meant with ambiguities and that's why I don't think gitlinks are the
right method here.

> But I agree the step is really weak before enabling to put any regular
> file outside of the directory.
> I do not see any reasonable workflow (to my eyes) for it but' maybe some
> day someone will came with a justifiable workflow which need it. we will
> never know.
> 
> But in this case we need solve some questions :
>   - Will we extend git status signaling untracked files out of the
>     repository ?

I don't think that would work well.

>   - What will do git-clean ? it is already dangerous inside the
>     repository. and it will be worst if it can access outside of it.

Hopefully git clean will learn the --recurse-submodules option in the not
too distant future, then you will have just the same danger for the files
inside a submodule.

>>> Because in this case it is not just a reference that is managed but the
>>> file itself. And this way there is a risk to overwrite some data not
>>> under revision control outside of the repository.
>>
>> You have the same risk when a gitlink points outside, as a submodule is a
>> way of controlling a bunch of files through that reference. And the file
>> would be under version control in the repository where it is registered, no?
> 
> I agree on this point.
> 
> But they are still confined in an another git repository not
> disseminated all over the file system.
> And it never corrupt this pointed repository. just ask it to do by it's
> own regular git commands.

The only difference here is that a submodule can contain more than one file,
but you can corrupt those files just as easily as a single file using git
commands.

> In fact you can argue that it can disseminate some complete git
> repository anywhere in the file system.
> And you will be right. (nothing is perfect.)

I'm not concerned about not being perfect (nothing is perfect), but it is
dangerous.

> I can do a second patch to prevent git submodule command to make clones
> outside of the repository.
> It will requires the user to do those clones manually.
> In fact this is already what I do.
> My only use of this is to track dependencies.

But gitlinks are more than simple dependencies, they are followed! "git
submodule", status, diff and fetch already follow them. push is learning
that right now. checkout, reset, merge and friends are being taught that
too (see the enhance_submodule branch in my github repo for the current
state). So a gitlink is more than just a simple reference, it is followed
by a lot of commands and the submodule it points to is manipulated by
those commands. We had a patch for "git archive --recurse-submodules" on
the list, what will that do when used in "project1"?

>>> And in fact it is just what I want, it enable me if I decide to work on
>>> an optional "BigProject" depending on both project 1 & 2.
>>>
>>> Then If lib1 is in version M:
>>>  - a git status in project2 will say nothing
>>>  - a git status in project1 will say
>>>    "modified:   ../lib1 (modified content)
>>>  - a git status in BigProject will say
>>>    "modified:   ../project1 (modified content)
>>>
>>> Then I know that I need to update project1 to work with the last version
>>> M of lib1.
>>
>> Maybe no update for project1 is needed, because M only contains a bugfix
>> which doesn't even need a recompilation of project1. But now you need to
>> add a commit to project1 nonetheless with a message like "Updated lib1
>> with a bugfix which is needed by project2" which makes your idea of
>> independent submodules break down.
> 
> In fact I work ni the world of "high integrity programming" then It is
> just what I need.
> If there is a bugfix in any library, used by the program it is no more
> the same program.
> I need the "SHA1" to correspond to the exact and complete source code
> involved in my executable.
> 
> And this way the "SHA1" of the project sign the "SHA1" of the
> libraries. 

I cannot believe you want single commits in your "Gimp" repo for every
combination of distributions and library versions where someone said
"this works". This is insane and won't scale at all.

What you do is that each distribution tests their combination of programs
and libraries and says "that works". And that is why the only sane way to
record this "high integrity programming" test result is in the superproject
(= distribution) and not in each of the program repositories.

I also see that it would be cool when a program could record "I do work with
that library version, if you use another you are on your own". But it will
never say "I only work with *this* specific library version", which is what
your proposal is trying to do.

>>>> You are opening a can of worms by having two different repos point to the same
>>>> submodule living in a third repo (which also happens to be their superproject
>>>> and must somehow ignore it). You'll have two SHA1s for a single submodule;
>>>> "git submodule foreach --recursive" will have interesting results too; and so
>>>> on. Not good.
>>>
>>> As I just said before it is my purpose to do it like that.
>>
>> I understood that, but what are you proposing to do to solve all the
>> problems your approach introduces? You can't just hand wave them away.
> 
> There is some solutions :
> 
>   - First it is one more **feature** if it does not correspond to your
>     work flow it does not prevent you to work exactly the way you did
>     until now.
> 
>   - Second if you want to use the feature but not want to have the
>     conflict **feature** (for me it is one), just put the independent
>     project with there libs in different directory
> 
>       -+- foo -+- lib1     (in version N)
>        |       +- project1
>        |
>        +- bar -+- lib1     (in version M)
>                +- project2
> 
>   - Third if you really need to have project 1 & 2 in the same
>     directory foo, that means they are needed by a third BigProject in
>     the same directory foo depending on project 1 & 2.
>     And then you really need git to declare a conflict.

No you don't. You just need to git to tell you: this is not the version I
was tested against, repeat the tests to be sure.

>>> Let say a concret exemple
>>>
>>> 3 different teams work on libtiff, libpng, and libjpeg they are totally
>>> unrelated.
>>>
>>> One more team is working on the "gimp". And they need those 3 libs in
>>> specific versions not necessarily there heads.
>>>
>>> One other unrelated team is working on "gqview" and need the same libs
>>> in other specifics versions (Why should they know what te gimp team
>>> does)
>>>
>>> Neither "gimp" and "gqview" project will contain directory with those
>>> libs inside. They just depend on them.
>>>
>>> And the last team work on the gnome project which need the "gimp" and
>>> "gqview". It will be this team witch have to care about having both
>>> "gimp" and "gqview" sharing the same libs version>
>>> And has well the gnome project will not contain "gqview" and "gimp" in
>>> its own tree.
>>> It will also depend on them.
>>
>> Cool, that is a real life example resembling what we have a my dayjob. But
>> a "gimp" and "gqview" project will only have dependencies like "use libpng
>> of version 1.2.3 or newer (because we need a feature/bugfix introduced
>> there)" and won't be tied to a special version of that library. This means
>> they need a dependency like "SHA1 or newer" instead of "exactly this SHA1".
> 
> It is useful and simpler to work like this but could introduce some
> bugs.

But that model is awfully successful and is used by all distributions I know,
so I suspect it is not that dangerous (especially when you do your own QA).

> The "gimp" team has tested it with libpng 1.2.3 and maybe know that it
> did not work with previous versions but if they do not have any crystal
> ball they never know if newer versions will not break something.
> In fact I doubt that the first version of gimp will work with the last
> version of libpng.

But in the real world it is exactly like that: gimp will work with all libpng
1.2.3 and newer, only when libpng is updated to 2.0.0 you have to check that
again. Of course there will be bugs in some combinations. But the advantage of
being able to then only fix libpng and have the bug fixed in Gimp without
having to change it is far greater than the possible problem you are describing
here.

>>> It is just the same with aptitude on debian.
>>> Each package know there dependency by themselves, does not contain there
>>> dependencies, and do not need a bigger superpackage to tell them what
>>> are there own dependencies.
>>
>> And this is a very good point for the "version x.yy-z *or newer*" argument,
>> they are /never/ tied to the /exact/ x.yy-z version, as that would make the
>> dependencies pretty much unusable. They use a "newer than x.yy-z" scheme.
> 
> It is an other feature that the one I need.
> But it is a good idea.
> 
> Nothing prevent us to make a patch to add a new test in git status to
> see if the current SHA1 in the libpng repository has the SHA1 of the
> gitlink in the gimp in its ancestor.

To make that feature useful for others (e.g. at my dayjob) this would be
necessary. And we would never want the exact SHA1 match, even though that
information might be what others (like you) want.

>>> And Still I realy want to have every project knowing there own
>>> dependency by themselves and not needing an external superproject to
>>> tell them what they need.
>>
>> I want to have that too! I'm just convinced using a gitlink to achieve that
>> is wrong in so many ways. I'd rather prefer to express such dependencies in
>> something like a config file, and I believe they should not be as strict as
>> "I need exactly that version" but rather like "this version or newer (and by
>> the way: we of course only tested that specific version ;-)". These
>> dependencies could then be checked and displayed by git status.
> 
> It effectively could be in a config file it seem good to me as well.

Ok.

> But if git handle this config file.
> Update it on a "git add ../libpng && git commit"

I'm not sure an automatic update at "git commit" would be the right thing to
do, as I think that should only happen after all tests have run successful,
not at the time you commit it. But anyways, that could be done with a post
commit hook. Or the test script can do it when it succeeded.

> And control the matching between the project and libraries on
> "git status">

An extension to "git status" to display the dependencies that aren't met is
a valid goal. What about starting with a script ("git depends"?) and then see
what can go into status?

> I can not see the difference with a gitlink.

Then you can just use a config file for that, no? ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html