Re: [PATCH 2/2] pretty: add merge and exclude options to %(describe)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 28.02.21 um 16:41 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sun, Feb 28 2021, René Scharfe. wrote:
>
>> Am 17.02.21 um 19:31 schrieb Jeff King:
>>> On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote:
>>>
>>>> Allow restricting the tags used by the placeholder %(describe) with the
>>>> options match and exclude.  E.g. the following command describes the
>>>> current commit using official version tags, without those for release
>>>> candidates:
>>>>
>>>>    $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)'
>>>
>>> An interesting side effect of this series is that it allows remote users
>>> asking for archives to fill in this data, too (by using export-subst
>>> placeholders). That includes servers allowing "git archive --remote",
>>> but also services like GitHub that will run git-archive on behalf of
>>> clients.
>>>
>>> I wonder what avenues for mischief this provides. Certainly using extra
>>> CPU to run git-describe.
>>
>> A repository can contain millions of files, each file can contain
>> millions of $Format:...$ sequences and each of them can contain millions
>> of %(describe) placeholders.  Each of them could have different match or
>> exclude args to prevent caching.  Allowing a single request to cause
>> trillions of calls of git describe sounds excessive.  Let's limit this.
>>
>> -- >8 --
>> Subject: [PATCH] archive: expand only a single %(describe) per archive
>>
>> Every %(describe) placeholder in $Format:...$ strings in files with the
>> attribute export-subst is expanded by calling git describe.  This can
>> potentially result in a lot of such calls per archive.  That's OK for
>> local repositories under control of the user of git archive, but could
>> be a problem for hosted repositories.
>>
>> Expand only a single %(describe) placeholder per archive for now to
>> avoid denial-of-service attacks.  We can make this limit configurable
>> later if needed, but let's start out simple.
>>
>> Reported-by: Jeff King <peff@xxxxxxxx>
>> Signed-off-by: René Scharfe <l.s.r@xxxxxx>
>> ---
>>  Documentation/gitattributes.txt |  3 ++-
>>  archive.c                       | 16 ++++++++++------
>>  archive.h                       |  2 ++
>>  pretty.c                        |  8 ++++++++
>>  pretty.h                        |  5 +++++
>>  t/t5001-archive-attr.sh         | 14 ++++++++++++++
>>  6 files changed, 41 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
>> index e84e104f93..0a60472bb5 100644
>> --- a/Documentation/gitattributes.txt
>> +++ b/Documentation/gitattributes.txt
>> @@ -1174,7 +1174,8 @@ tag then no replacement will be done.  The placeholders are the same
>>  as those for the option `--pretty=format:` of linkgit:git-log[1],
>>  except that they need to be wrapped like this: `$Format:PLACEHOLDERS$`
>>  in the file.  E.g. the string `$Format:%H$` will be replaced by the
>> -commit hash.
>> +commit hash.  However, only one `%(describe)` placeholder is expanded
>> +per archive to avoid denial-of-service attacks.
>>
>>
>>  Packing objects
>> diff --git a/archive.c b/archive.c
>> index 5919d9e505..2dd2236ae0 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -37,13 +37,10 @@ void init_archivers(void)
>>
>>  static void format_subst(const struct commit *commit,
>>  			 const char *src, size_t len,
>> -			 struct strbuf *buf)
>> +			 struct strbuf *buf, struct pretty_print_context *ctx)
>>  {
>>  	char *to_free = NULL;
>>  	struct strbuf fmt = STRBUF_INIT;
>> -	struct pretty_print_context ctx = {0};
>> -	ctx.date_mode.type = DATE_NORMAL;
>> -	ctx.abbrev = DEFAULT_ABBREV;
>>
>>  	if (src == buf->buf)
>>  		to_free = strbuf_detach(buf, NULL);
>> @@ -61,7 +58,7 @@ static void format_subst(const struct commit *commit,
>>  		strbuf_add(&fmt, b + 8, c - b - 8);
>>
>>  		strbuf_add(buf, src, b - src);
>> -		format_commit_message(commit, fmt.buf, buf, &ctx);
>> +		format_commit_message(commit, fmt.buf, buf, ctx);
>>  		len -= c + 1 - src;
>>  		src  = c + 1;
>>  	}
>> @@ -94,7 +91,7 @@ static void *object_file_to_archive(const struct archiver_args *args,
>>  		strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
>>  		convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta);
>>  		if (commit)
>> -			format_subst(commit, buf.buf, buf.len, &buf);
>> +			format_subst(commit, buf.buf, buf.len, &buf, args->pretty_ctx);
>>  		buffer = strbuf_detach(&buf, &size);
>>  		*sizep = size;
>>  	}
>> @@ -633,12 +630,19 @@ int write_archive(int argc, const char **argv, const char *prefix,
>>  		  const char *name_hint, int remote)
>>  {
>>  	const struct archiver *ar = NULL;
>> +	struct pretty_print_describe_status describe_status = {0};
>> +	struct pretty_print_context ctx = {0};
>>  	struct archiver_args args;
>>  	int rc;
>>
>>  	git_config_get_bool("uploadarchive.allowunreachable", &remote_allow_unreachable);
>>  	git_config(git_default_config, NULL);
>>
>> +	describe_status.max_invocations = 1;
>> +	ctx.date_mode.type = DATE_NORMAL;
>> +	ctx.abbrev = DEFAULT_ABBREV;
>> +	ctx.describe_status = &describe_status;
>> +	args.pretty_ctx = &ctx;
>>  	args.repo = repo;
>>  	args.prefix = prefix;
>>  	string_list_init(&args.extra_files, 1);
>> diff --git a/archive.h b/archive.h
>> index 33551b7ee1..49fab71aaf 100644
>> --- a/archive.h
>> +++ b/archive.h
>> @@ -5,6 +5,7 @@
>>  #include "pathspec.h"
>>
>>  struct repository;
>> +struct pretty_print_context;
>>
>>  struct archiver_args {
>>  	struct repository *repo;
>> @@ -22,6 +23,7 @@ struct archiver_args {
>>  	unsigned int convert : 1;
>>  	int compression_level;
>>  	struct string_list extra_files;
>> +	struct pretty_print_context *pretty_ctx;
>>  };
>>
>>  /* main api */
>> diff --git a/pretty.c b/pretty.c
>> index c612d2ac9b..032e89cd4e 100644
>> --- a/pretty.c
>> +++ b/pretty.c
>> @@ -1247,6 +1247,14 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
>>  		struct child_process cmd = CHILD_PROCESS_INIT;
>>  		struct strbuf out = STRBUF_INIT;
>>  		struct strbuf err = STRBUF_INIT;
>> +		struct pretty_print_describe_status *describe_status;
>> +
>> +		describe_status = c->pretty_ctx->describe_status;
>> +		if (describe_status) {
>> +			if (!describe_status->max_invocations)
>> +				return 0;
>> +			describe_status->max_invocations--;
>> +		}
>>
>>  		cmd.git_cmd = 1;
>>  		strvec_push(&cmd.args, "describe");
>> diff --git a/pretty.h b/pretty.h
>> index 7ce6c0b437..27b15947ff 100644
>> --- a/pretty.h
>> +++ b/pretty.h
>> @@ -23,6 +23,10 @@ enum cmit_fmt {
>>  	CMIT_FMT_UNSPECIFIED
>>  };
>>
>> +struct pretty_print_describe_status {
>> +	unsigned int max_invocations;
>> +};
>> +
>>  struct pretty_print_context {
>>  	/*
>>  	 * Callers should tweak these to change the behavior of pp_* functions.
>> @@ -44,6 +48,7 @@ struct pretty_print_context {
>>  	int color;
>>  	struct ident_split *from_ident;
>>  	unsigned encode_email_headers:1;
>> +	struct pretty_print_describe_status *describe_status;
>>
>>  	/*
>>  	 * Fields below here are manipulated internally by pp_* functions and
>> diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
>> index e9aa97117a..1c9ce3956b 100755
>> --- a/t/t5001-archive-attr.sh
>> +++ b/t/t5001-archive-attr.sh
>> @@ -128,4 +128,18 @@ test_expect_success 'export-subst' '
>>  	test_cmp substfile2 archive/substfile2
>>  '
>>
>> +test_expect_success 'export-subst expands %(describe) once' '
>> +	echo "\$Format:%(describe)\$" >substfile3 &&
>> +	echo "\$Format:%(describe)\$" >>substfile3 &&
>> +	echo "\$Format:%(describe)${LF}%(describe)\$" >substfile4 &&
>> +	git add substfile[34] &&
>> +	git commit -m export-subst-describe &&
>> +	git tag -m export-subst-describe export-subst-describe &&
>> +	git archive HEAD >archive-describe.tar &&
>> +	extract_tar_to_dir archive-describe &&
>> +	desc=$(git describe) &&
>> +	grep -F "$desc" archive-describe/substfile[34] >substituted &&
>> +	test_line_count = 1 substituted
>> +'
>> +
>>  test_done
>
> This whole thing seems rather backwards as a solution to the "we run it
> N times" problem I mentioned in <87k0r7a4sr.fsf@xxxxxxxxxxxxxxxxxxx>.

In the referenced message you pointed out that using %(describe) more
than once incurs the cost of a call to "git describe" each time and
asked for a way to avoid that.

This patch here prevents the second and later calls for "git archive".
It's not intended to speed up expanding repeated placeholders, but
rather to reject maliciously crafted placeholders even if they are not
the same.

So does that mean you have the requirement to use %(describe) more
than once in the same archive, and this DoS protection would be too
strict for you?  In that case we'd need a way to increase the limit,
e.g. a configuration variable or command line option.  Otherwise I
don't see the connection between your message and this patch.

Side note: Keeping the "git describe" running and feeding it commits
one by one through a pipe (with a new --stdin option, I suppose) as
mentioned in <66720982-04d6-3096-9ea2-ea5bc3fcd121@xxxxxx> would
speed up expanding repeated placeholders as well.

> Instead of taking the trouble of putting a limit in the
> pretty_print_context so we don't call it N times for the same commit,
> why not just put the strbuf with the result in that same struct?
>
> Then you can have it millions of times, and it won't be any more
> expensive than the other existing %(format) specifiers (actually cheaper
> than most).

Each %(describe) placeholder can have a unique set of match or exclude
arguments.  Caching them all would increase the strength of a DoS
attack.  Caching a few of them would be OK, but is ineffective in
reducing the strength of the attack.

> Or is there some edge case I'm missing here where "git archive" can
> either be fed N commits and we share the context, or we share the
> context across formatting different revisions in some cases?

git archive currently works only on a single commit.

René




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux