Re: [PATCH 2/4] blame: validate and peel the object names on the ignore list

René Scharfe <l.s.r@xxxxxx> · Sat, 26 Sep 2020 18:23:42 +0200

Am 25.09.20 um 07:59 schrieb Junio C Hamano:
> The command reads list of object names to place on the ignore list
> either from the command line or from a file, but they are not
> checked with their object type (those read from the file are not
> even checked for object existence).
>
> Extend the oidset_parse_file() API and allow it to take a callback
> that can be used to die (e.g. when an inappropriate input is read)
> or modify the object name read (e.g. when a tag pointing at a commit
> is read, and the caller wants a commit object name), and use it in
> the code that handles ignore list.

What's the benefit of such a check?  Ignoring a non-existing or
type-mismatched object is really easy -- no actual effort is required to
fulfill that request.

When I request "Don't eat any glue!", perfectly human responses could be
"But I don't have any glue!" or "It doesn't even taste that good.", but
I'd expect a computer program to act I bit more logical and just don't
do it, without talking back.  Maybe that's just me.

(I had been bitten by a totally different software adding such a check,
which made it complain about my long catch-all ignore list, and I had to
craft and maintain a specific "clean" list for each deployment --
perhaps I'm still bitter about that.)

>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
>  builtin/blame.c              | 27 ++++++++++++++++++++++++--
>  oidset.c                     |  9 ++++++++-
>  oidset.h                     |  9 +++++++++
>  t/t8013-blame-ignore-revs.sh | 37 ++++++++++++++++++++++++++----------
>  4 files changed, 69 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 94ef57c1cc..baa5d979cc 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -27,6 +27,7 @@
>  #include "object-store.h"
>  #include "blame.h"
>  #include "refs.h"
> +#include "tag.h"
>
>  static char blame_usage[] = N_("git blame [<options>] [<rev-opts>] [<rev>] [--] <file>");
>
> @@ -803,6 +804,26 @@ static int is_a_rev(const char *name)
>  	return OBJ_NONE < oid_object_info(the_repository, &oid, NULL);
>  }
>
> +static int peel_to_commit_oid(struct object_id *oid_ret, void *cbdata)
> +{
> +	struct repository *r = ((struct blame_scoreboard *)cbdata)->repo;
> +	struct object_id oid;
> +
> +	oidcpy(&oid, oid_ret);
> +	while (1) {
> +		struct object *obj;
> +		int kind = oid_object_info(r, &oid, NULL);
> +		if (kind == OBJ_COMMIT) {
> +			oidcpy(oid_ret, &oid);

At that point we know it's an object, but cast it up to the most generic
class we have -- an object ID.  We could have set an object flag to mark
it ignored instead, which would be trivial to check later.  On the other
hand it probably wouldn't make much of a difference -- hashmaps are
pretty fast, and blame has lots of things to do beyond ignoring commits.

> +			return 0;
> +		}
> +		if (kind != OBJ_TAG)
> +			return -1;
> +		obj = deref_tag(r, parse_object(r, &oid), NULL, 0);
> +		oidcpy(&oid, &obj->oid);
> +	}
> +}
> +
>  static void build_ignorelist(struct blame_scoreboard *sb,
>  			     struct string_list *ignore_revs_file_list,
>  			     struct string_list *ignore_rev_list)
> @@ -815,10 +836,12 @@ static void build_ignorelist(struct blame_scoreboard *sb,
>  		if (!strcmp(i->string, ""))
>  			oidset_clear(&sb->ignore_list);

This preexisting feature is curious.  It's even documented ('An empty
file name, "", will clear the list of revs from previously processed
files.') and covered by t8013.6.  Why would we need such magic in
addition to the standard negation (--no-ignore-revs-file) for clearing
the list?  The latter counters blame.ignoreRevsFile as well. *puzzled*

>  		else
> -			oidset_parse_file(&sb->ignore_list, i->string);
> +			oidset_parse_file_carefully(&sb->ignore_list, i->string,
> +						    peel_to_commit_oid, sb);
>  	}
>  	for_each_string_list_item(i, ignore_rev_list) {
> -		if (get_oid_committish(i->string, &oid))
> +		if (get_oid_committish(i->string, &oid) ||
> +		    peel_to_commit_oid(&oid, sb))
>  			die(_("cannot find revision %s to ignore"), i->string);
>  		oidset_insert(&sb->ignore_list, &oid);
>  	}
> diff --git a/oidset.c b/oidset.c
> index 15d4e18c37..2d0ab76fb5 100644
> --- a/oidset.c
> +++ b/oidset.c
> @@ -42,6 +42,12 @@ int oidset_size(struct oidset *set)
>  }
>
>  void oidset_parse_file(struct oidset *set, const char *path)
> +{
> +	oidset_parse_file_carefully(set, path, NULL, NULL);
> +}
> +
> +void oidset_parse_file_carefully(struct oidset *set, const char *path,
> +				 oidset_parse_tweak_fn fn, void *cbdata)
>  {
>  	FILE *fp;
>  	struct strbuf sb = STRBUF_INIT;
> @@ -66,7 +72,8 @@ void oidset_parse_file(struct oidset *set, const char *path)
>  		if (!sb.len)
>  			continue;
>
> -		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
> +		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0' ||
> +		    (fn && fn(&oid, cbdata)))

OK, so this turns the basic all-I-know-is-hashes oidset loader into a
flexible higher-order map function.  Fun, but wise?  Can't make up my
mind.

René