Re: [PATCH] submodule recursion in git-archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26 Nov 2013, at 07:17, René Scharfe <l.s.r@xxxxxx> wrote:

> Am 26.11.2013 01:04, schrieb Nick Townsend:
>> My first git patch - so shout out if I’ve got the etiquette wrong! Or
>> of course if I’ve missed something.
> 
> Thanks for the patches!  Please send only one per message (the second
> one as a reply to the first one, or both as replies to a cover letter),
> though -- that makes commenting on them much easier.
> 
> Side note: Documentation/SubmittingPatches doesn't mention that (yet),
> AFAICS.
> 
>> Subject: [PATCH 1/2] submodule: add_submodule_odb() usability
>> 
>> Although add_submodule_odb() is documented as being
>> externally usable, it is declared static and also
>> has incorrect documentation.
>> 
>> This commit fixes those and makes no changes to
>> existing code using them. All tests still pass.
> 
> Sign-off missing (see Documentation/SubmittingPatches).
> 
>> ---
>> Documentation/technical/api-ref-iteration.txt | 4 ++--
>> submodule.c                                   | 2 +-
>> submodule.h                                   | 1 +
>> 3 files changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/Documentation/technical/api-ref-iteration.txt b/Documentation/technical/api-ref-iteration.txt
>> index aa1c50f..cbee624 100644
>> --- a/Documentation/technical/api-ref-iteration.txt
>> +++ b/Documentation/technical/api-ref-iteration.txt
>> @@ -50,10 +50,10 @@ submodules object database. You can do this by a code-snippet like
>> this:
>> 
>> 	const char *path = "path/to/submodule"
>> -	if (!add_submodule_odb(path))
>> +	if (add_submodule_odb(path))
>> 		die("Error submodule '%s' not populated.", path);
>> 
>> -`add_submodule_odb()` will return an non-zero value on success. If you
>> +`add_submodule_odb()` will return a zero value on success. If you
> 
> "return zero on success" instead?

I like the brevity of your suggestion. Again, I just used what was there…

> 
>> do not do this you will get an error for each ref that it does not point
>> to a valid object.
>> 
>> diff --git a/submodule.c b/submodule.c
>> index 1905d75..1ea46be 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -143,7 +143,7 @@ void stage_updated_gitmodules(void)
>> 		die(_("staging updated .gitmodules failed"));
>> }
>> 
>> -static int add_submodule_odb(const char *path)
>> +int add_submodule_odb(const char *path)
>> {
>> 	struct strbuf objects_directory = STRBUF_INIT;
>> 	struct alternate_object_database *alt_odb;
>> diff --git a/submodule.h b/submodule.h
>> index 7beec48..3e3cdca 100644
>> --- a/submodule.h
>> +++ b/submodule.h
>> @@ -41,5 +41,6 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam
>> 		struct string_list *needs_pushing);
>> int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
>> void connect_work_tree_and_git_dir(const char *work_tree, const char *git_dir);
>> +int add_submodule_odb(const char *path);
>> 
>> #endif
> 
>> Subject: [PATCH 2/2] archive: allow submodule recursion on git-archive
>> 
>> When using git-archive to produce a dump of a
>> repository, the existing code does not recurse
>> into a submodule when it encounters it in the tree
>> traversal. These changes add a command line flag
>> that permits this.
>> 
>> Note that the submodules must be updated in the
>> repository, otherwise this cannot take place.
>> 
>> The feature is disabled for remote repositories as
>> the git_work_tree fails. This is a possible future
>> enhancement.
> 
> Hmm, curious.  Why does it fail?  I guess that happens with bare
> repositories, only, right?  (Which are the most likely kind of remote
> repos to encounter, of course.)

I’m not sure why it failed - I didn’t think it should - but it did.
See discussion in other email.

> 
>> Two additional fields are added to archiver_args:
>>  * recurse  - a boolean indicator
>>  * treepath - the path part of the tree-ish
>>               eg. the 'www' in HEAD:www
>> 
>> The latter is used within the archive writer to
>> determin the correct path for the submodule .git
>> file.
>> 
>> Signed-off-by: Nick Townsend <nick.townsend@xxxxxxx>
>> ---
>> Documentation/git-archive.txt |  9 +++++++++
>> archive.c                     | 38 ++++++++++++++++++++++++++++++++++++--
>> archive.h                     |  2 ++
>> 3 files changed, 47 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
>> index b97aaab..b4df735 100644
>> --- a/Documentation/git-archive.txt
>> +++ b/Documentation/git-archive.txt
>> @@ -11,6 +11,7 @@ SYNOPSIS
>> [verse]
>> 'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>> 	      [-o <file> | --output=<file>] [--worktree-attributes]
>> +	      [--recursive|--recurse-submodules]
> 
> I'd expect git archive --recurse to add subdirectories and their
> contents, which it does right now, and --no-recurse to only archive the
> specified objects, which is not implemented.  IAW: I wouldn't normally
> associate an option with that name with submodules.  Would
> --recurse-submodules alone suffice?
> 
> Side note: With only one of the options defined you could shorten them
> on the command line to e.g. --rec; with both you'd need to type at least
> --recursi or --recurse to disambiguate -- even though they ultimately do
> the same.

See other comments - I agree - recurse-submodules it is!

> 
>> 	      [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
>> 	      [<path>...]
>> 
>> @@ -51,6 +52,14 @@ OPTIONS
>> --prefix=<prefix>/::
>> 	Prepend <prefix>/ to each filename in the archive.
>> 
>> +--recursive::
>> +--recurse-submodules::
>> +	Archive entries in submodules. Errors occur if the submodules
>> +	have not been initialized and updated.
>> +	Run `git submodule update --init --recursive` immediately after
>> +	the clone is finished to avoid this.
>> +	This option is not available with remote repositories.
>> +
>> -o <file>::
>> --output=<file>::
>> 	Write the archive to <file> instead of stdout.
>> diff --git a/archive.c b/archive.c
>> index 346f3b2..f6313c9 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -5,6 +5,7 @@
>> #include "archive.h"
>> #include "parse-options.h"
>> #include "unpack-trees.h"
>> +#include "submodule.h"
>> 
>> static char const * const archive_usage[] = {
>> 	N_("git archive [options] <tree-ish> [<path>...]"),
>> @@ -131,13 +132,32 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>> 		args->convert = ATTR_TRUE(check[1].value);
>> 	}
>> 
>> +	if (S_ISGITLINK(mode) && args->recurse) {
>> +		const char *work_tree = get_git_work_tree();
>> +		if (!work_tree) {
>> +			  die("Can't go recursive when no work dir");
>> +		}
> 
> Style nit: No curly braces around single-line statements, please.
> 
> Perhaps mention submodules in the error message?
> 
> It would be nicer to get_git_work_tree right after the parameters have
> been parsed and before any archive contents have been written and error
> out early.
> 
>> +		static struct strbuf dotgit = STRBUF_INIT;
> 
> We avoid declarations after statements because older compilers don't
> support it.
> 
> You release the memory at the end of this block; that means there's no
> advantage in making this strbuf static.  Allocating and freeing the
> memory for the path of each submodule shouldn't cause any performance
> issues, so please just drop static from the declaration.
> 
>> +		strbuf_reset(&dotgit);
> 
> ... and then you don't need to reset anymore.
> 
>> +		strbuf_grow(&dotgit, PATH_MAX);
> 
> I'd drop that as well; the number of submodules should be low enough
> that the possibly avoided reallocations by giving this hint shouldn't be
> noticeable.

I just cut and pasted this code from another location - didn’t really think too hard
and I wasn’t familiar with the strbuf routines in the codebase. Thanks for the input.

> 
>> +		strbuf_addstr(&dotgit, work_tree);
>> +		strbuf_addch(&dotgit, '/');
>> +		if (args->treepath) {
>> +			  strbuf_addstr(&dotgit, args->treepath);
>> +			  strbuf_addch(&dotgit, '/');
>> +		}
>> +		strbuf_add(&dotgit, path_without_prefix,strlen(path_without_prefix)-1);
>> +		if (add_submodule_odb(dotgit.buf))
>> +			  die("Can't add submodule: %s", dotgit.buf);
> 
> Hmm, I wonder if we can traverse the tree and load all submodule object
> databases before traversing it again to actually write file contents.
> That would spare the user from getting half of an archive together with
> that error message.

See comments in another email about where and how to parse submodules. If we do the
parse .gitmodules approach then we could do something like this. The as-you-go
method seemed easier and more accurate - after all you may not traverse into a
submodule at all - so any time spent adding it’s objects would be wasted.

In the same conversation I agree that it’s best to warn when a submodule can’t be
located, and continue with just a directory entry in its place (as currently)

> 
>> +		strbuf_release(&dotgit);
>> +	}
>> 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
>> 		if (args->verbose)
>> 			fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
>> 		err = write_entry(args, sha1, path.buf, path.len, mode);
>> 		if (err)
>> 			return err;
>> -		return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
>> +		return (S_ISGITLINK(mode) && !args->recurse) ? 0: READ_TREE_RECURSIVE;
> 
> This line is longer than 80 characters, which we tend to avoid.  How
> about this?
> 
> 		if (ISGITLINK(mode) && !args->recurse)
> 			return 0;
> 		return READ_TREE_RECURSIVE;
> 
>> 	}

Agreed. I don’t usually enjoy multiple returns (especially so close to the end!)
but if the style guidelines are OK with that then so am I.

>> 
>> 	if (args->verbose)
>> @@ -256,10 +276,16 @@ static void parse_treeish_arg(const char **argv,
>> 	const struct commit *commit;
>> 	unsigned char sha1[20];
>> 
>> +	const char *colon = strchr(name, ':');
>> +
>> +	/* Store the path on the ref for later (required for --recursive) */
>> +	char *treepath = NULL;
>> +	if (colon) {
>> +		treepath = strdup(colon+1);
>> +	}
> 
> Style: No curly braces, space around operators, use wrapper functions
> for simple memory error handling:
> 
> 	if (colon)
> 		treepath = xstrdup(colon + 1);

Is that the standard practice? To use the wrapper functions?  

> 
>> 	/* Remotes are only allowed to fetch actual refs */
>> 	if (remote) {
>> 		char *ref = NULL;
>> -		const char *colon = strchr(name, ':');
>> 		int refnamelen = colon ? colon - name : strlen(name);
>> 
>> 		if (!dwim_ref(name, refnamelen, sha1, &ref))
>> @@ -296,9 +322,11 @@ static void parse_treeish_arg(const char **argv,
>> 		tree = parse_tree_indirect(tree_sha1);
>> 	}
>> 	ar_args->tree = tree;
>> +	ar_args->treepath = treepath;
>> 	ar_args->commit_sha1 = commit_sha1;
>> 	ar_args->commit = commit;
>> 	ar_args->time = archive_time;
>> +
>> }
> 
> Please don't add empty lines before the end of a block.
> 
>> #define OPT__COMPR(s, v, h, p) \
>> @@ -318,6 +346,7 @@ static int parse_archive_args(int argc, const char **argv,
>> 	const char *exec = NULL;
>> 	const char *output = NULL;
>> 	int compression_level = -1;
>> +	int recurse = 0;
>> 	int verbose = 0;
>> 	int i;
>> 	int list = 0;
>> @@ -331,6 +360,8 @@ static int parse_archive_args(int argc, const char **argv,
>> 			N_("write the archive to this file")),
>> 		OPT_BOOL(0, "worktree-attributes", &worktree_attributes,
>> 			N_("read .gitattributes in working directory")),
>> +		OPT_BOOL(0, "recursive", &recurse, N_("include submodules in archive")),
>> +		OPT_BOOL(0, "recurse-submodules", &recurse, N_("include submodules in archive")),
>> 		OPT__VERBOSE(&verbose, N_("report archived files on stderr")),
>> 		OPT__COMPR('0', &compression_level, N_("store only"), 0),
>> 		OPT__COMPR('1', &compression_level, N_("compress faster"), 1),
>> @@ -355,6 +386,8 @@ static int parse_archive_args(int argc, const char **argv,
>> 
>> 	argc = parse_options(argc, argv, NULL, opts, archive_usage, 0);
>> 
>> +	if (is_remote && recurse)
>> +		die("Cannot include submodules with option --remote");
>> 	if (remote)
>> 		die("Unexpected option --remote");
>> 	if (exec)
>> @@ -393,6 +426,7 @@ static int parse_archive_args(int argc, const char **argv,
>> 					format, compression_level);
>> 		}
>> 	}
>> +	args->recurse = recurse;
>> 	args->verbose = verbose;
>> 	args->base = base;
>> 	args->baselen = strlen(base);
>> diff --git a/archive.h b/archive.h
>> index 4a791e1..577238d 100644
>> --- a/archive.h
>> +++ b/archive.h
>> @@ -7,10 +7,12 @@ struct archiver_args {
>> 	const char *base;
>> 	size_t baselen;
>> 	struct tree *tree;
>> +	const char *treepath;
>> 	const unsigned char *commit_sha1;
>> 	const struct commit *commit;
>> 	time_t time;
>> 	struct pathspec pathspec;
>> +	unsigned int recurse : 1;
> 
> Name it recurse_submodules?
> 
>> 	unsigned int verbose : 1;
>> 	unsigned int worktree_attributes : 1;
>> 	unsigned int convert : 1;
>> -- 1.8.3.4 (Apple Git-47) 
> 
> A test script (t/t5005-archive-submodules.sh?) would be nice which
> exercises the new option.

I take all the style points noted, and will create a test script - I didn’t want to invest too much
in this project until I’d received a response!

Thanks for your time in reviewing - these things are a great help.

Cheers
Nick

> 
> René




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]