Nikhil Benesch <nikhil.benesch@xxxxxxxxx> writes: > This commit is a revival of Lars Hjemli's 2009 patch to provide an > option to include submodules in the output of `git archive`. > > The `--recurse-submodules` option (named consistently with fetch, clone, > and ls-files) will recursively traverse submodules in the repository and > consider their contents for inclusion in the output archive, subject to > any pathspec filters. Like other commands that have learned > `--recurse-submodules`, submodules that have not been checked out will > not be traversed. > > Signed-off-by: Nikhil Benesch <nikhil.benesch@xxxxxxxxx> > --- I'll let others comment on the proposed log message. > diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt > index cfa1e4ebe..f223f9e05 100644 > --- a/Documentation/git-archive.txt > +++ b/Documentation/git-archive.txt > @@ -11,8 +11,8 @@ SYNOPSIS > [verse] > 'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>] > [-o <file> | --output=<file>] [--worktree-attributes] > - [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish> > - [<path>...] > + [--recurse-submodules] [--remote=<repo> [--exec=<git-upload-archive>]] > + <tree-ish> [<path>...] With the reflowing of lines it was not immediately obvious, but the only change is to addd [--recurse-submodules] in front of the existing [--remote=<repo>] option. It would have probably made more sense to add new things at the end, just because these options can be given in any order. > +--recurse-submodules:: > + Recursively include the contents of any checked-out submodules in > + the archive. OK. A question to the submodule folks: Is "checked-out" the right terminology in this context? I am wondering if we have reached more official set of words to express submodule states discussed in [*1*]. > - if (S_ISDIR(mode) || S_ISGITLINK(mode)) { > - if (args->verbose) > - fprintf(stderr, "%.*s\n", (int)path.len, path.buf); > - err = write_entry(args, sha1, path.buf, path.len, mode); > - if (err) > - return err; > - return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0); > - } > - > if (args->verbose) > fprintf(stderr, "%.*s\n", (int)path.len, path.buf); > - return write_entry(args, sha1, path.buf, path.len, mode); > + err = write_entry(args, sha1, path.buf, path.len, mode); > + if (err) > + return err; > + if (S_ISDIR(mode) || (S_ISGITLINK(mode) && args->recurse_submodules && > + !add_submodule_odb(path_without_prefix))) > + return READ_TREE_RECURSIVE; > + return 0; So, we used to (and without the option we still) stop recursing when we see a gitlink after asking write_entry() to write the path itself out, but with the option, we keep going. How does pathspec limiting work with this codepath? If you have a superproject with a regular file "sub/README" that belongs to the superproject itself, and a submodule at "sub/module/", what makes sure that this command $ git archive --recurse-submodules HEAD -- 'sub/R*' 'sub/module/lib/' looks only in 'lib/' part of the submodule? > diff --git a/t/t5005-archive-submodules.sh b/t/t5005-archive-submodules.sh > new file mode 100755 > index 000000000..747e38627 > --- /dev/null > +++ b/t/t5005-archive-submodules.sh > @@ -0,0 +1,112 @@ > +#!/bin/sh > + > +test_description='git archive can include submodule content' > + > +. ./test-lib.sh > + > +add_file() > +{ Style. A shell function begins like so (see Documentation/CodingGuidelines): add_file () { > + git add $1 && Put dq around "$1" to make it easier to reuse, even if you know you only happen to use pathnames without any $IFS whitespace characters. I think all of these comments apply to the other shell function you added to this file.. > +test_expect_success 'by default, submodules are not included' ' > + echo "File 1" >1 && > + add_file 1 && > + add_submodule 2 3 && > + add_submodule 4 5 && > + cat <<EOF >expected && > +1 > +2/ > +4/ > +EOF Use <<-EOF to allow you to indent your here-doc data. Also when your here-doc data does not need any $variable_interpolation, quote the end marker to help reduce reviewer's mental burden, i.e. cat <<-\EOF >expected && 1 2/ 4/ EOF The same comment applies to other tests (I won't even repeat this). > + git archive HEAD >normal.tar && > + tar -tf normal.tar >actual && Lose "-" from "-tf"; that's the canonical and the most portable way to spell the option to "tar". > +test_expect_success 'packed submodules are supported' ' > + msg=$(cd 2 && git repack -ad && git count-objects) && > + test "$msg" = "0 objects, 0 kilobytes" && > + git archive --recurse-submodules HEAD >packed.tar && > + tar -tf packed.tar >actual && > + test_cmp expected actual > +' I am not sure how packing would be expected to break anything, but an extra test may not hurt. I notice that you only test the case where the submodule's ".git" is embedded in its working tree (as opposed to ".git" being a "gitdir:" file that points at the real location of the repository, somewhere inside ".git/modules/" of the superproject). You probably want to make sure that also works, too. I also notice that you do not use any pathspec in your "git archive" invocation. You'd want to make sure that works as expected. Thanks. [Reference] *1* <20170209020855.23486-1-sbeller@xxxxxxxxxx>