On 27 Nov 2013, at 11:43, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Nick Townsend <nick.townsend@xxxxxxx> writes: > >> On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> >>> Even if the code is run inside a repository with a working tree, >>> when producing a tarball out of an ancient commit that had a >>> submodule not at its current location, --recurse-submodules option >>> should do the right thing, so asking for working tree location of >>> that submodule to find its repository is wrong, I think. It may >>> happen to find one if the archived revision is close enough to what >>> is currently checked out, but that may not necessarily be the case. >>> >>> At that point when the code discovers an S_ISGITLINK entry, it >>> should have both a pathname to the submodule relative to the >>> toplevel and the commit object name bound to that submodule >>> location. What it should do, when it does not find the repository >>> at the given path (maybe because there is no working tree, or the >>> sudmodule directory has moved over time) is roughly: >>> >>> - Read from .gitmodules at the top-level from the tree it is >>> creating the tarball out of; >>> >>> - Find "submodule.$name.path" entry that records that path to the >>> submodule; and then >>> >>> - Using that $name, find the stashed-away location of the submodule >>> repository in $GIT_DIR/modules/$name. >>> >>> or something like that. >>> >>> This is a related tangent, but when used in a repository that people >>> often use as their remote, the repository discovery may have to >>> interact with the relative URL. People often ship .gitmodules with >>> >>> [submodule "bar"] >>> URL = ../bar.git >>> path = barDir >>> >>> for a top-level project "foo" that can be cloned thusly: >>> >>> git clone git://site.xz/foo.git >>> >>> and host bar.git to be clonable with >>> >>> git clone git://site.xz/bar.git barDir/ >>> >>> inside the working tree of the foo project. In such a case, when >>> "archive --recurse-submodules" is running, it would find the >>> repository for the "bar" submodule at "../bar.git", I would think. >>> >>> So this part needs a bit more thought, I am afraid. >> >> I see that there is a lot of potential complexity around setting up a submodule: > > No question about it. > >> * The .gitmodules file can be dirty (easy to flag, but should we >> allow archive to proceed?) > > As we are discussing "archive", which takes a tree object from the > top-level project that is recorded in the object database, the > information _about_ the submodule in question should come from the > given tree being archived. There is no reason for the .gitmodules > file that happens to be sitting in the working tree of the top-level > project to be involved in the decision, so its dirtyness should not > matter, I think. If the tree being archived has a submodule whose > name is "kernel" at path "linux/" (relative to the top-level > project), its repository should be at .git/modules/kernel in the > layout recent git-submodule prepares, and we should find that > path-and-name mapping from .gitmodules recorded in that tree object > we are archiving. The version that happens to be checked out to the > working tree may have moved the submodule to a new path "linux-3.0/" > and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it, > but when archiving a tree that has the submodule at "linux/", it > would not help---we would not know to look at "linux-3.0/.git" to > learn that information anyway because .gitmodules in the working > tree would say that the submodule at path "linux-3.0/" is with name > "kernel", and would not tell us anything about "linux/". > >> * Users can mess with settings both prior to git submodule init >> and before git submodule update. > > I think this is irrelevant for exactly the same reason as above. > > What makes this tricker, however, is how to deal with an old-style > repository, where the submodule repositories are embedded in the > working tree that happens to be checked out. In that case, we may > have to read .gitmodules from two places, i.e. > > (1) We are archiving a tree with a submodule at "linux/"; > > (2) We read .gitmodules from that tree and learn that the submodule > has name "kernel"; > > (3) There is no ".git/modules/kernel" because the repository uses > the old layout (if the user never was interested in this > submodule, .git/modules/kernel may also be missing, and we > should tell these two cases apart by checking .git/config to > see if a corresponding entry for the "kernel" submodule exists > there); > > (4) In a repository that uses the old layout, there must be the > repository somewhere embedded in the current working tree (this > inability to remove is why we use the new layout these days). > We can learn where it is by looking at .gitmodules in the > working tree---map the name "kernel" we learned earlier, and > map it to the current path ("linux-3.0/" if you have been > following this example so far). > > And in that fallback context, I would say that reading from a dirty > (or "messed with by the user") .gitmodules is the right thing to > do. Perhaps the user may be in the process of moving the submodule > in his working tree with > > $ mv linux-3.0 linux-3.2 > $ git config -f .gitmodules submodule.kernel.path linux-3.2 > > but hasn't committed the change yet. > >> For those reasons I deliberately decided not to reproduce the >> above logic all by myself. > > As I already hinted, I agree that the "how to find the location of > submodule repository, given a particular tree in the top-level > project the submodule belongs to and the path to the submodule in > question" deserves a separate thread to discuss with area experts. As per my email to Heiko on this thread, I’m happy to start such a discussion - I’ll use your notes as a starting point. I’m much more comfortable using a wiki for this - is this common or should I start a new mail thread with RFC in the title or similar? I did complete my work on my version of git-archive (for internal use) and added some regression tests for current behaviour. Also the add_submodule_odb patch should IMHO be incorporated anyway. I’ll resubmit those two for consideration in a new thread. Kind Regards Nick Townsend -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html