Junio C Hamano wrote:
If it is truly only about "submodule update" then the change
seems too intrusive, especially "remotes.default" variable that
affects the way how fetch and merge works in situations that do
not involve submodules.
If it is not limited to "submodule update" but equally valid fix
to non-submodule situations, the changes to the other parts may
very well be justifiable, but that would mean your "Yes" is a
lie and instead should be "No, but these situations are helped
by these changes because...".
First, I resent the patch series last night, it now uses core.origin to
avoid touching remotes.* namespace.
The changes *do* fix a nit when on a non-tracking branch. With this,
fetch / merge / pull will now honor that the user said (via git clone -o
frotz) "my upstream is nicknamed frotz" and not try to use origin when
origin was never defined.
So, while fixing this minor aggravation wasn't my motivation, I view
this as a nice side-benefit :^).
The driving issues:
1) I deal with too many servers for "origin" to be a useful nick name,
and we have an agreed set of nickname / server pairings across my project.
2) Therefore, we always do git clone -o frotz frotz.foo.bar/path_to_git.
3) Because of 2, for top-level, "origin" is not defined, tracking
branches set up via git branch --track point to the correct remote, and
we basically understand branch names as <nickname>/branch. In other
words, we *are* aware of what server we are using.
4) git-submodule update breaks the above:
- a) it invokes git clone frotz.foo.bar/path_to_git thus defining
"origin" as the nickname for frotz.foo.bar.
b) it invokes bare git-fetch on a detached head, so the upstream *has*
to be origin.
If your top-level repository needs to access a specific server
"frotz.foo.bar" for updates, then you would have bootstrapped
the whole thing with:
$ git clone git://frotz.foo.bar/toplevel.git
and in that particular instance of the repository, the source
repository on frotz.foo.bar would have been known as 'origin',
right?
Nope, we did it with git clone -o frotz git://frotz.foo.bar/toplevel.git
We *never* define origin, frozt.foo.bar is *always* frotz.
I would not object if you also gave another nickname
'frotz' to the same repository for consistency across
developers.
good. We are making (some) progress. :^)
If that is the case, I am wondering why your subprojects are not
pointing at the corresponding repository on that same
'frotz.foo.bar' machine as 'origin'. I suspect the reason is
that .gitmodules do not say 'frotz.foo.bar' but name some other
machine.
Actually,
1) We don't use origin because we avoid having to wonder "Is
frotz.foo.bar named "origin" or "frotz" on this client, and thus how do
I get data from frotz?
2) I submitted the change allowing submodules to be recorded into
.gitmodules with a relative url (e.g., ./path_from_parent_to_submodule)
rather than an absolute, so we record the relative path only.
3) Thus, git submodule has set up the submodules to point at the parent
project's default remote. However, in the parent the server is nicknamed
"frotz", but now in the submodule the server is nicknamed "origin" Oops.
With my patches, parent and submodule both refer to frotz.foo.bar as frotz.
And in-tree .gitmodules can name only one URL, as it is project
global and shared by everybody. There is no escaping it.
At least as things were designed, "git submodule init" takes URL
recorded in .gitmodules as a hint, but this is for the user to
override in .git/config in the top-level. Maybe the UI to allow
this overriding is not easy enough to use, and your submodules
ended up pointing at wrong (from the machine's point of view)
URL as 'origin'. And perhaps that is the root cause of this
issue?
Again, the relative-url patch was to address this so that a project that
is mirrored to another server remains valid on the new server without
modifying the .gitmodules in-tree. (Yes, I know you *can* modify
information in a given clones .git/config, but I'm trying to avoid such
manual per clone/checkout modifications where it can reasonably be done.).
Basically, I think an important (but not complete) test of the design is
that
git clone -o frotz git://frotz.foo.bar/myproject.git
cd myproject
git submodule init
git submodule update
work, with origin = frotz throughout the submodules, and with the whole
project correctly checked out even if the entire project was rehosted
onto a different server. With relative urls and my latest patch series
last night, this all works, and of course upstream can still be "origin"
if that is what is desired.
While our overall project exists on many servers, mirroring is an
incorrect term. Rather, only certain branches of various parts exist
everywhere, many other branches are specific to a given server, so we
really name branches using servername/branchname. It is this aspect of
the project that causes us to be aware of the server in use, and thus
makes use of "origin" as a generic upstream not useful.
I am looking at the discussion on the list archive when we
discussed the initial design of .gitmodules:
http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47502
http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47548
http://thread.gmane.org/gmane.comp.version-control.git/47466/focus=47621
I do not think we are there yet, and suspect that the current
"git submodule init" does not give the user a chance to say "the
URL recorded in the in-tree .gitmodules corresponds to this URL
in this repository for administrative or network connectivity or
whatever reasons".
Maybe that is the real issue that we should be tackling. I
dunno.
Although I _think_ being able to use nickname other than
hardcoded 'origin' for fetch/merge is a good change, if my above
suspicion is correct, that change alone would not make the life
easier to people who _use_ submodules, as the need for them to
set up extra nicknames (like 'frotz') and configure the
submodule repositories to use that specific nickname instead of
'origin' would not change.
git-submodule right now supports two different layouts (urls relative to
the parent, and absolute urls such that each sub-module is on an
independent server). The management approaches to these are going to be
different.
I also suspect there are two basic use cases here: accumulation of a
number of independently managed projects vs. splitting a single major
project into a number of smaller pieces to allow some decoupling, but
still managing the set as a composite whole.
There may be some direct correlation of use-case and submodule layout,
don't know. My project uses relative-urls, and I am managing a large
project that has been split into a number of components. So, my
suggestions are focused entirely upon this design and use-case, and I
don't expect I am addressing the others at all. (As usual, this requires
someone who needs the other model(s) to step up and drive).
For *my* uses (relative urls, single logical project):
1) There are times when the parent's branch.<name>.remote should be
flowed down to all subprojects for git submodule update, of course this
would require that the remote be defined for all.
2) Thus, there needs to be a way to define a new remote globally for the
project, and have it be correctly interpreted by each submodule (e.g., a
repeat of the relative-url dereferencing now done by submodule init, but
applied later to all submodules to define a new remote). Yes, this could
be accomplished by going into each submodule independently and issuing
appropriate commands, but administration would be much easier given a
top-level command that could recurse and "do the right thing" per
sub-project.
I *suspect* that origin is a much more useful concept for the alternate
construct (absolute urls, loose alliance of separately managed
projects), but as I said that is not my problem so please ask folks who
have that model to define what works for them.
For communication purposes, I would agree with Dscho that the
name 'origin' that names different things for different people
is wrong and using specific name 'frotz' would solve
communication issues. But when using the repository and doing
actual work, wouldn't it be _much_ better if you can
consistently go to a repository on a random machine and always
can say 'origin' to mean the other repository this repository
usually gets new objects from (and sends its new objects to)?
(Acutally, I thought I was the one arguing that using origin when it
means different things to different folks is not good. That's the root
of my problems. :^) )
Anyway, I have not found any use of "origin" on my project really
useful. We have to be and *are* aware of the server/branchname in use,
not just the branch. Partly this is because different subgroups have
different natural gathering points (we tend to exchange data via ad hoc
"mob" branches on whatever server is most accessible to the particular
group), and partly because some information simply cannot be allowed on
some servers, but basically the more accessible a server is, the less
information that server can have. I believe "origin" is really useful
only when it has just one meaning, or when all values are effectively
identical (e.g., you have several mirrors for load balancing, etc, but
all are identical modulo mirroring delays).
OTOH, a reasonable change to the semantics of "origin" might be to have:
1) core.origin name the remote that is the "normal" upstream.
2) Reserve and allow use of the name "origin" to mean $core.origin,
e.g., in shell scripts replace all references to remote "origin" with
$(git config core.origin). Of course, if core.origin = origin, then no
user visible change occurs.
In this way, git would not record the same remote's branches in two
ways (as origin/master and as frotz/master), but rather dereference
origin -> frotz and then get frotz/master. Dunno, no matter how you
slice it, having more than one way to refer to the same remote is going
to be confusing, and that's why we don't use origin.
Mark
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html