Re: [PATCH] submodule recursion in git-archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 26.11.2013 01:04, schrieb Nick Townsend:
> My first git patch - so shout out if I’ve got the etiquette wrong! Or
> of course if I’ve missed something.

Thanks for the patches!  Please send only one per message (the second
one as a reply to the first one, or both as replies to a cover letter),
though -- that makes commenting on them much easier.

Side note: Documentation/SubmittingPatches doesn't mention that (yet),
AFAICS.

> Subject: [PATCH 1/2] submodule: add_submodule_odb() usability
> 
> Although add_submodule_odb() is documented as being
> externally usable, it is declared static and also
> has incorrect documentation.
> 
> This commit fixes those and makes no changes to
> existing code using them. All tests still pass.

Sign-off missing (see Documentation/SubmittingPatches).

> ---
>  Documentation/technical/api-ref-iteration.txt | 4 ++--
>  submodule.c                                   | 2 +-
>  submodule.h                                   | 1 +
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
> index aa1c50f..cbee624 100644
> --- a/Documentation/technical/api-ref-iteration.txt
> +++ b/Documentation/technical/api-ref-iteration.txt
> @@ -50,10 +50,10 @@ submodules object database. You can do this by a code-snippet like
>  this:
>  
>  	const char *path = "path/to/submodule"
> -	if (!add_submodule_odb(path))
> +	if (add_submodule_odb(path))
>  		die("Error submodule '%s' not populated.", path);
>  
> -`add_submodule_odb()` will return an non-zero value on success. If you
> +`add_submodule_odb()` will return a zero value on success. If you

"return zero on success" instead?

>  do not do this you will get an error for each ref that it does not point
>  to a valid object.
>  
> diff --git a/submodule.c b/submodule.c
> index 1905d75..1ea46be 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
>  		die(_("staging updated .gitmodules failed"));
>  }
>  
> -static int add_submodule_odb(const char *path)
> +int add_submodule_odb(const char *path)
>  {
>  	struct strbuf objects_directory = STRBUF_INIT;
>  	struct alternate_object_database *alt_odb;
> diff --git a/submodule.h b/submodule.h
> index 7beec48..3e3cdca 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam
>  		struct string_list *needs_pushing);
>  int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
>  void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
> +int add_submodule_odb(const char *path);
>  
>  #endif

> Subject: [PATCH 2/2] archive: allow submodule recursion on git-archive
> 
> When using git-archive to produce a dump of a
> repository, the existing code does not recurse
> into a submodule when it encounters it in the tree
> traversal. These changes add a command line flag
> that permits this.
> 
> Note that the submodules must be updated in the
> repository, otherwise this cannot take place.
> 
> The feature is disabled for remote repositories as
> the git_work_tree fails. This is a possible future
> enhancement.

Hmm, curious.  Why does it fail?  I guess that happens with bare
repositories, only, right?  (Which are the most likely kind of remote
repos to encounter, of course.)

> Two additional fields are added to archiver_args:
>   * recurse  - a boolean indicator
>   * treepath - the path part of the tree-ish
>                eg. the 'www' in HEAD:www
> 
> The latter is used within the archive writer to
> determin the correct path for the submodule .git
> file.
> 
> Signed-off-by: Nick Townsend <nick.townsend@xxxxxxx>
> ---
>  Documentation/git-archive.txt |  9 +++++++++
>  archive.c                     | 38 ++++++++++++++++++++++++++++++++++++--
>  archive.h                     |  2 ++
>  3 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index b97aaab..b4df735 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -11,6 +11,7 @@ SYNOPSIS
>  [verse]
>  'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>  	      [-o <file> | --output=<file>] [--worktree-attributes]
> +	      [--recursive|--recurse-submodules]

I'd expect git archive --recurse to add subdirectories and their
contents, which it does right now, and --no-recurse to only archive the
specified objects, which is not implemented.  IAW: I wouldn't normally
associate an option with that name with submodules.  Would
--recurse-submodules alone suffice?

Side note: With only one of the options defined you could shorten them
on the command line to e.g. --rec; with both you'd need to type at least
--recursi or --recurse to disambiguate -- even though they ultimately do
the same.

>  	      [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
>  	      [<path>...]
>  
> @@ -51,6 +52,14 @@ OPTIONS
>  --prefix=<prefix>/::
>  	Prepend <prefix>/ to each filename in the archive.
>  
> +--recursive::
> +--recurse-submodules::
> +	Archive entries in submodules. Errors occur if the submodules
> +	have not been initialized and updated.
> +	Run `git submodule update --init --recursive` immediately after
> +	the clone is finished to avoid this.
> +	This option is not available with remote repositories.
> +
>  -o <file>::
>  --output=<file>::
>  	Write the archive to <file> instead of stdout.
> diff --git a/archive.c b/archive.c
> index 346f3b2..f6313c9 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -5,6 +5,7 @@
>  #include "archive.h"
>  #include "parse-options.h"
>  #include "unpack-trees.h"
> +#include "submodule.h"
>  
>  static char const * const archive_usage[] = {
>  	N_("git archive [options] <tree-ish> [<path>...]"),
> @@ -131,13 +132,32 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>  		args->convert = ATTR_TRUE(check[1].value);
>  	}
>  
> +	if (S_ISGITLINK(mode) && args->recurse) {
> +		const char *work_tree = get_git_work_tree();
> +		if (!work_tree) {
> +			  die("Can't go recursive when no work dir");
> +		}

Style nit: No curly braces around single-line statements, please.

Perhaps mention submodules in the error message?

It would be nicer to get_git_work_tree right after the parameters have
been parsed and before any archive contents have been written and error
out early.

> +		static struct strbuf dotgit = STRBUF_INIT;

We avoid declarations after statements because older compilers don't
support it.

You release the memory at the end of this block; that means there's no
advantage in making this strbuf static.  Allocating and freeing the
memory for the path of each submodule shouldn't cause any performance
issues, so please just drop static from the declaration.

> +		strbuf_reset(&dotgit);

... and then you don't need to reset anymore.

> +		strbuf_grow(&dotgit, PATH_MAX);

I'd drop that as well; the number of submodules should be low enough
that the possibly avoided reallocations by giving this hint shouldn't be
noticeable.

> +		strbuf_addstr(&dotgit, work_tree);
> +		strbuf_addch(&dotgit, '/');
> +		if (args->treepath) {
> +			  strbuf_addstr(&dotgit, args->treepath);
> +			  strbuf_addch(&dotgit, '/');
> +		}
> +		strbuf_add(&dotgit, path_without_prefix,strlen(path_without_prefix)-1);
> +		if (add_submodule_odb(dotgit.buf))
> +			  die("Can't add submodule: %s", dotgit.buf);

Hmm, I wonder if we can traverse the tree and load all submodule object
databases before traversing it again to actually write file contents.
That would spare the user from getting half of an archive together with
that error message.

> +		strbuf_release(&dotgit);
> +	}
>  	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
>  		if (args->verbose)
>  			fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
>  		err = write_entry(args, sha1, path.buf, path.len, mode);
>  		if (err)
>  			return err;
> -		return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
> +		return (S_ISGITLINK(mode) && !args->recurse) ? 0: READ_TREE_RECURSIVE;

This line is longer than 80 characters, which we tend to avoid.  How
about this?

		if (ISGITLINK(mode) && !args->recurse)
			return 0;
		return READ_TREE_RECURSIVE;

>  	}
>  
>  	if (args->verbose)
> @@ -256,10 +276,16 @@ static void parse_treeish_arg(const char **argv,
>  	const struct commit *commit;
>  	unsigned char sha1[20];
>  
> +	const char *colon = strchr(name, ':');
> +
> +	/* Store the path on the ref for later (required for --recursive) */
> +	char *treepath = NULL;
> +	if (colon) {
> +		treepath = strdup(colon+1);
> +	}

Style: No curly braces, space around operators, use wrapper functions
for simple memory error handling:

	if (colon)
		treepath = xstrdup(colon + 1);

>  	/* Remotes are only allowed to fetch actual refs */
>  	if (remote) {
>  		char *ref = NULL;
> -		const char *colon = strchr(name, ':');
>  		int refnamelen = colon ? colon - name : strlen(name);
>  
>  		if (!dwim_ref(name, refnamelen, sha1, &ref))
> @@ -296,9 +322,11 @@ static void parse_treeish_arg(const char **argv,
>  		tree = parse_tree_indirect(tree_sha1);
>  	}
>  	ar_args->tree = tree;
> +	ar_args->treepath = treepath;
>  	ar_args->commit_sha1 = commit_sha1;
>  	ar_args->commit = commit;
>  	ar_args->time = archive_time;
> +
>  }

Please don't add empty lines before the end of a block.

>  #define OPT__COMPR(s, v, h, p) \
> @@ -318,6 +346,7 @@ static int parse_archive_args(int argc, const char **argv,
>  	const char *exec = NULL;
>  	const char *output = NULL;
>  	int compression_level = -1;
> +	int recurse = 0;
>  	int verbose = 0;
>  	int i;
>  	int list = 0;
> @@ -331,6 +360,8 @@ static int parse_archive_args(int argc, const char **argv,
>  			N_("write the archive to this file")),
>  		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
>  			N_("read .gitattributes in working directory")),
> +		OPT_BOOL(0, "recursive", &recurse, N_("include submodules in archive")),
> +		OPT_BOOL(0, "recurse-submodules", &recurse, N_("include submodules in archive")),
>  		OPT__VERBOSE(&verbose, N_("report archived files on stderr")),
>  		OPT__COMPR('0', &compression_level, N_("store only"), 0),
>  		OPT__COMPR('1', &compression_level, N_("compress faster"), 1),
> @@ -355,6 +386,8 @@ static int parse_archive_args(int argc, const char **argv,
>  
>  	argc = parse_options(argc, argv, NULL, opts, archive_usage, 0);
>  
> +	if (is_remote && recurse)
> +		die("Cannot include submodules with option --remote");
>  	if (remote)
>  		die("Unexpected option --remote");
>  	if (exec)
> @@ -393,6 +426,7 @@ static int parse_archive_args(int argc, const char **argv,
>  					format, compression_level);
>  		}
>  	}
> +	args->recurse = recurse;
>  	args->verbose = verbose;
>  	args->base = base;
>  	args->baselen = strlen(base);
> diff --git a/archive.h b/archive.h
> index 4a791e1..577238d 100644
> --- a/archive.h
> +++ b/archive.h
> @@ -7,10 +7,12 @@ struct archiver_args {
>  	const char *base;
>  	size_t baselen;
>  	struct tree *tree;
> +	const char *treepath;
>  	const unsigned char *commit_sha1;
>  	const struct commit *commit;
>  	time_t time;
>  	struct pathspec pathspec;
> +	unsigned int recurse : 1;

Name it recurse_submodules?

>  	unsigned int verbose : 1;
>  	unsigned int worktree_attributes : 1;
>  	unsigned int convert : 1;
> -- 1.8.3.4 (Apple Git-47) 

A test script (t/t5005-archive-submodules.sh?) would be nice which
exercises the new option.

René

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]