Begin forwarded message: > From: Nick Townsend <nick.townsend@xxxxxxx> > Subject: Re: [PATCH] submodule recursion in git-archive > Date: 2 December 2013 16:00:50 GMT-8 > To: Junio C Hamano <gitster@xxxxxxxxx> > Cc: René Scharfe <l.s.r@xxxxxx>, Jens Lehmann <Jens.Lehmann@xxxxxx>, git@xxxxxxxxxxxxxxx, Jeff King <peff@xxxxxxxx> > > > On 27 Nov 2013, at 11:43, Junio C Hamano <gitster@xxxxxxxxx> wrote: > >> Nick Townsend <nick.townsend@xxxxxxx> writes: >> >>> On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> >>>> Even if the code is run inside a repository with a working tree, >>>> when producing a tarball out of an ancient commit that had a >>>> submodule not at its current location, --recurse-submodules option >>>> should do the right thing, so asking for working tree location of >>>> that submodule to find its repository is wrong, I think. It may >>>> happen to find one if the archived revision is close enough to what >>>> is currently checked out, but that may not necessarily be the case. >>>> >>>> At that point when the code discovers an S_ISGITLINK entry, it >>>> should have both a pathname to the submodule relative to the >>>> toplevel and the commit object name bound to that submodule >>>> location. What it should do, when it does not find the repository >>>> at the given path (maybe because there is no working tree, or the >>>> sudmodule directory has moved over time) is roughly: >>>> >>>> - Read from .gitmodules at the top-level from the tree it is >>>> creating the tarball out of; >>>> >>>> - Find "submodule.$name.path" entry that records that path to the >>>> submodule; and then >>>> >>>> - Using that $name, find the stashed-away location of the submodule >>>> repository in $GIT_DIR/modules/$name. >>>> >>>> or something like that. >>>> >>>> This is a related tangent, but when used in a repository that people >>>> often use as their remote, the repository discovery may have to >>>> interact with the relative URL. People often ship .gitmodules with >>>> >>>> [submodule "bar"] >>>> URL = ../bar.git >>>> path = barDir >>>> >>>> for a top-level project "foo" that can be cloned thusly: >>>> >>>> git clone git://site.xz/foo.git >>>> >>>> and host bar.git to be clonable with >>>> >>>> git clone git://site.xz/bar.git barDir/ >>>> >>>> inside the working tree of the foo project. In such a case, when >>>> "archive --recurse-submodules" is running, it would find the >>>> repository for the "bar" submodule at "../bar.git", I would think. >>>> >>>> So this part needs a bit more thought, I am afraid. >>> >>> I see that there is a lot of potential complexity around setting up a submodule: >> >> No question about it. >> >>> * The .gitmodules file can be dirty (easy to flag, but should we >>> allow archive to proceed?) >> >> As we are discussing "archive", which takes a tree object from the >> top-level project that is recorded in the object database, the >> information _about_ the submodule in question should come from the >> given tree being archived. There is no reason for the .gitmodules >> file that happens to be sitting in the working tree of the top-level >> project to be involved in the decision, so its dirtyness should not >> matter, I think. If the tree being archived has a submodule whose >> name is "kernel" at path "linux/" (relative to the top-level >> project), its repository should be at .git/modules/kernel in the >> layout recent git-submodule prepares, and we should find that >> path-and-name mapping from .gitmodules recorded in that tree object >> we are archiving. The version that happens to be checked out to the >> working tree may have moved the submodule to a new path "linux-3.0/" >> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it, >> but when archiving a tree that has the submodule at "linux/", it >> would not help---we would not know to look at "linux-3.0/.git" to >> learn that information anyway because .gitmodules in the working >> tree would say that the submodule at path "linux-3.0/" is with name >> "kernel", and would not tell us anything about "linux/". >> >>> * Users can mess with settings both prior to git submodule init >>> and before git submodule update. >> >> I think this is irrelevant for exactly the same reason as above. >> >> What makes this tricker, however, is how to deal with an old-style >> repository, where the submodule repositories are embedded in the >> working tree that happens to be checked out. In that case, we may >> have to read .gitmodules from two places, i.e. >> >> (1) We are archiving a tree with a submodule at "linux/"; >> >> (2) We read .gitmodules from that tree and learn that the submodule >> has name "kernel"; >> >> (3) There is no ".git/modules/kernel" because the repository uses >> the old layout (if the user never was interested in this >> submodule, .git/modules/kernel may also be missing, and we >> should tell these two cases apart by checking .git/config to >> see if a corresponding entry for the "kernel" submodule exists >> there); >> >> (4) In a repository that uses the old layout, there must be the >> repository somewhere embedded in the current working tree (this >> inability to remove is why we use the new layout these days). >> We can learn where it is by looking at .gitmodules in the >> working tree---map the name "kernel" we learned earlier, and >> map it to the current path ("linux-3.0/" if you have been >> following this example so far). >> >> And in that fallback context, I would say that reading from a dirty >> (or "messed with by the user") .gitmodules is the right thing to >> do. Perhaps the user may be in the process of moving the submodule >> in his working tree with >> >> $ mv linux-3.0 linux-3.2 >> $ git config -f .gitmodules submodule.kernel.path linux-3.2 >> >> but hasn't committed the change yet. >> >>> For those reasons I deliberately decided not to reproduce the >>> above logic all by myself. >> >> As I already hinted, I agree that the "how to find the location of >> submodule repository, given a particular tree in the top-level >> project the submodule belongs to and the path to the submodule in >> question" deserves a separate thread to discuss with area experts. > > As per my email to Heiko on this thread, I’m happy to start such > a discussion - I’ll use your notes as a starting point. I’m much more comfortable > using a wiki for this - is this common or should I start a new mail thread > with RFC in the title or similar? > > I did complete my work on my version of git-archive (for internal use) and added some regression tests > for current behaviour. Also the add_submodule_odb patch should IMHO be incorporated > anyway. I’ll resubmit those two for consideration in a new thread. > > Kind Regards > Nick Townsend > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html