Nick Townsend <nick.townsend@xxxxxxx> writes: > On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@xxxxxxxxx> wrote: > >> Even if the code is run inside a repository with a working tree, >> when producing a tarball out of an ancient commit that had a >> submodule not at its current location, --recurse-submodules option >> should do the right thing, so asking for working tree location of >> that submodule to find its repository is wrong, I think. It may >> happen to find one if the archived revision is close enough to what >> is currently checked out, but that may not necessarily be the case. >> >> At that point when the code discovers an S_ISGITLINK entry, it >> should have both a pathname to the submodule relative to the >> toplevel and the commit object name bound to that submodule >> location. What it should do, when it does not find the repository >> at the given path (maybe because there is no working tree, or the >> sudmodule directory has moved over time) is roughly: >> >> - Read from .gitmodules at the top-level from the tree it is >> creating the tarball out of; >> >> - Find "submodule.$name.path" entry that records that path to the >> submodule; and then >> >> - Using that $name, find the stashed-away location of the submodule >> repository in $GIT_DIR/modules/$name. >> >> or something like that. >> >> This is a related tangent, but when used in a repository that people >> often use as their remote, the repository discovery may have to >> interact with the relative URL. People often ship .gitmodules with >> >> [submodule "bar"] >> URL = ../bar.git >> path = barDir >> >> for a top-level project "foo" that can be cloned thusly: >> >> git clone git://site.xz/foo.git >> >> and host bar.git to be clonable with >> >> git clone git://site.xz/bar.git barDir/ >> >> inside the working tree of the foo project. In such a case, when >> "archive --recurse-submodules" is running, it would find the >> repository for the "bar" submodule at "../bar.git", I would think. >> >> So this part needs a bit more thought, I am afraid. > > I see that there is a lot of potential complexity around setting up a submodule: No question about it. > * The .gitmodules file can be dirty (easy to flag, but should we > allow archive to proceed?) As we are discussing "archive", which takes a tree object from the top-level project that is recorded in the object database, the information _about_ the submodule in question should come from the given tree being archived. There is no reason for the .gitmodules file that happens to be sitting in the working tree of the top-level project to be involved in the decision, so its dirtyness should not matter, I think. If the tree being archived has a submodule whose name is "kernel" at path "linux/" (relative to the top-level project), its repository should be at .git/modules/kernel in the layout recent git-submodule prepares, and we should find that path-and-name mapping from .gitmodules recorded in that tree object we are archiving. The version that happens to be checked out to the working tree may have moved the submodule to a new path "linux-3.0/" and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it, but when archiving a tree that has the submodule at "linux/", it would not help---we would not know to look at "linux-3.0/.git" to learn that information anyway because .gitmodules in the working tree would say that the submodule at path "linux-3.0/" is with name "kernel", and would not tell us anything about "linux/". > * Users can mess with settings both prior to git submodule init > and before git submodule update. I think this is irrelevant for exactly the same reason as above. What makes this tricker, however, is how to deal with an old-style repository, where the submodule repositories are embedded in the working tree that happens to be checked out. In that case, we may have to read .gitmodules from two places, i.e. (1) We are archiving a tree with a submodule at "linux/"; (2) We read .gitmodules from that tree and learn that the submodule has name "kernel"; (3) There is no ".git/modules/kernel" because the repository uses the old layout (if the user never was interested in this submodule, .git/modules/kernel may also be missing, and we should tell these two cases apart by checking .git/config to see if a corresponding entry for the "kernel" submodule exists there); (4) In a repository that uses the old layout, there must be the repository somewhere embedded in the current working tree (this inability to remove is why we use the new layout these days). We can learn where it is by looking at .gitmodules in the working tree---map the name "kernel" we learned earlier, and map it to the current path ("linux-3.0/" if you have been following this example so far). And in that fallback context, I would say that reading from a dirty (or "messed with by the user") .gitmodules is the right thing to do. Perhaps the user may be in the process of moving the submodule in his working tree with $ mv linux-3.0 linux-3.2 $ git config -f .gitmodules submodule.kernel.path linux-3.2 but hasn't committed the change yet. > For those reasons I deliberately decided not to reproduce the > above logic all by myself. As I already hinted, I agree that the "how to find the location of submodule repository, given a particular tree in the top-level project the submodule belongs to and the path to the submodule in question" deserves a separate thread to discuss with area experts. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html