On Sun, Feb 28 2021, René Scharfe. wrote: > Am 17.02.21 um 19:31 schrieb Jeff King: >> On Sun, Feb 14, 2021 at 11:10:57AM +0100, René Scharfe. wrote: >> >>> Allow restricting the tags used by the placeholder %(describe) with the >>> options match and exclude. E.g. the following command describes the >>> current commit using official version tags, without those for release >>> candidates: >>> >>> $ git log -1 --format='%(describe:match=v[0-9]*,exclude=*rc*)' >> >> An interesting side effect of this series is that it allows remote users >> asking for archives to fill in this data, too (by using export-subst >> placeholders). That includes servers allowing "git archive --remote", >> but also services like GitHub that will run git-archive on behalf of >> clients. >> >> I wonder what avenues for mischief this provides. Certainly using extra >> CPU to run git-describe. > > A repository can contain millions of files, each file can contain > millions of $Format:...$ sequences and each of them can contain millions > of %(describe) placeholders. Each of them could have different match or > exclude args to prevent caching. Allowing a single request to cause > trillions of calls of git describe sounds excessive. Let's limit this. > > -- >8 -- > Subject: [PATCH] archive: expand only a single %(describe) per archive > > Every %(describe) placeholder in $Format:...$ strings in files with the > attribute export-subst is expanded by calling git describe. This can > potentially result in a lot of such calls per archive. That's OK for > local repositories under control of the user of git archive, but could > be a problem for hosted repositories. > > Expand only a single %(describe) placeholder per archive for now to > avoid denial-of-service attacks. We can make this limit configurable > later if needed, but let's start out simple. > > Reported-by: Jeff King <peff@xxxxxxxx> > Signed-off-by: René Scharfe <l.s.r@xxxxxx> > --- > Documentation/gitattributes.txt | 3 ++- > archive.c | 16 ++++++++++------ > archive.h | 2 ++ > pretty.c | 8 ++++++++ > pretty.h | 5 +++++ > t/t5001-archive-attr.sh | 14 ++++++++++++++ > 6 files changed, 41 insertions(+), 7 deletions(-) > > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index e84e104f93..0a60472bb5 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -1174,7 +1174,8 @@ tag then no replacement will be done. The placeholders are the same > as those for the option `--pretty=format:` of linkgit:git-log[1], > except that they need to be wrapped like this: `$Format:PLACEHOLDERS$` > in the file. E.g. the string `$Format:%H$` will be replaced by the > -commit hash. > +commit hash. However, only one `%(describe)` placeholder is expanded > +per archive to avoid denial-of-service attacks. > > > Packing objects > diff --git a/archive.c b/archive.c > index 5919d9e505..2dd2236ae0 100644 > --- a/archive.c > +++ b/archive.c > @@ -37,13 +37,10 @@ void init_archivers(void) > > static void format_subst(const struct commit *commit, > const char *src, size_t len, > - struct strbuf *buf) > + struct strbuf *buf, struct pretty_print_context *ctx) > { > char *to_free = NULL; > struct strbuf fmt = STRBUF_INIT; > - struct pretty_print_context ctx = {0}; > - ctx.date_mode.type = DATE_NORMAL; > - ctx.abbrev = DEFAULT_ABBREV; > > if (src == buf->buf) > to_free = strbuf_detach(buf, NULL); > @@ -61,7 +58,7 @@ static void format_subst(const struct commit *commit, > strbuf_add(&fmt, b + 8, c - b - 8); > > strbuf_add(buf, src, b - src); > - format_commit_message(commit, fmt.buf, buf, &ctx); > + format_commit_message(commit, fmt.buf, buf, ctx); > len -= c + 1 - src; > src = c + 1; > } > @@ -94,7 +91,7 @@ static void *object_file_to_archive(const struct archiver_args *args, > strbuf_attach(&buf, buffer, *sizep, *sizep + 1); > convert_to_working_tree(args->repo->index, path, buf.buf, buf.len, &buf, &meta); > if (commit) > - format_subst(commit, buf.buf, buf.len, &buf); > + format_subst(commit, buf.buf, buf.len, &buf, args->pretty_ctx); > buffer = strbuf_detach(&buf, &size); > *sizep = size; > } > @@ -633,12 +630,19 @@ int write_archive(int argc, const char **argv, const char *prefix, > const char *name_hint, int remote) > { > const struct archiver *ar = NULL; > + struct pretty_print_describe_status describe_status = {0}; > + struct pretty_print_context ctx = {0}; > struct archiver_args args; > int rc; > > git_config_get_bool("uploadarchive.allowunreachable", &remote_allow_unreachable); > git_config(git_default_config, NULL); > > + describe_status.max_invocations = 1; > + ctx.date_mode.type = DATE_NORMAL; > + ctx.abbrev = DEFAULT_ABBREV; > + ctx.describe_status = &describe_status; > + args.pretty_ctx = &ctx; > args.repo = repo; > args.prefix = prefix; > string_list_init(&args.extra_files, 1); > diff --git a/archive.h b/archive.h > index 33551b7ee1..49fab71aaf 100644 > --- a/archive.h > +++ b/archive.h > @@ -5,6 +5,7 @@ > #include "pathspec.h" > > struct repository; > +struct pretty_print_context; > > struct archiver_args { > struct repository *repo; > @@ -22,6 +23,7 @@ struct archiver_args { > unsigned int convert : 1; > int compression_level; > struct string_list extra_files; > + struct pretty_print_context *pretty_ctx; > }; > > /* main api */ > diff --git a/pretty.c b/pretty.c > index c612d2ac9b..032e89cd4e 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -1247,6 +1247,14 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ > struct child_process cmd = CHILD_PROCESS_INIT; > struct strbuf out = STRBUF_INIT; > struct strbuf err = STRBUF_INIT; > + struct pretty_print_describe_status *describe_status; > + > + describe_status = c->pretty_ctx->describe_status; > + if (describe_status) { > + if (!describe_status->max_invocations) > + return 0; > + describe_status->max_invocations--; > + } > > cmd.git_cmd = 1; > strvec_push(&cmd.args, "describe"); > diff --git a/pretty.h b/pretty.h > index 7ce6c0b437..27b15947ff 100644 > --- a/pretty.h > +++ b/pretty.h > @@ -23,6 +23,10 @@ enum cmit_fmt { > CMIT_FMT_UNSPECIFIED > }; > > +struct pretty_print_describe_status { > + unsigned int max_invocations; > +}; > + > struct pretty_print_context { > /* > * Callers should tweak these to change the behavior of pp_* functions. > @@ -44,6 +48,7 @@ struct pretty_print_context { > int color; > struct ident_split *from_ident; > unsigned encode_email_headers:1; > + struct pretty_print_describe_status *describe_status; > > /* > * Fields below here are manipulated internally by pp_* functions and > diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh > index e9aa97117a..1c9ce3956b 100755 > --- a/t/t5001-archive-attr.sh > +++ b/t/t5001-archive-attr.sh > @@ -128,4 +128,18 @@ test_expect_success 'export-subst' ' > test_cmp substfile2 archive/substfile2 > ' > > +test_expect_success 'export-subst expands %(describe) once' ' > + echo "\$Format:%(describe)\$" >substfile3 && > + echo "\$Format:%(describe)\$" >>substfile3 && > + echo "\$Format:%(describe)${LF}%(describe)\$" >substfile4 && > + git add substfile[34] && > + git commit -m export-subst-describe && > + git tag -m export-subst-describe export-subst-describe && > + git archive HEAD >archive-describe.tar && > + extract_tar_to_dir archive-describe && > + desc=$(git describe) && > + grep -F "$desc" archive-describe/substfile[34] >substituted && > + test_line_count = 1 substituted > +' > + > test_done This whole thing seems rather backwards as a solution to the "we run it N times" problem I mentioned in <87k0r7a4sr.fsf@xxxxxxxxxxxxxxxxxxx>. Instead of taking the trouble of putting a limit in the pretty_print_context so we don't call it N times for the same commit, why not just put the strbuf with the result in that same struct? Then you can have it millions of times, and it won't be any more expensive than the other existing %(format) specifiers (actually cheaper than most). Or is there some edge case I'm missing here where "git archive" can either be fed N commits and we share the context, or we share the context across formatting different revisions in some cases?