Re: [RFC/PATCH 08/12] mktag: use fsck instead of custom verify_tag()

Jeff King <peff@xxxxxxxx> · Thu, 26 Nov 2020 03:17:04 -0500

On Thu, Nov 26, 2020 at 02:28:50AM +0100, Ævar Arnfjörð Bjarmason wrote:

> There was other "mktag" validation logic that I think makes sense to
> just remove. Namely:
> 
>  A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag
>     code disallowed values larger than 1400.
> 
>     Yes there's currently no timezone with a greater offset[2], but
>     since we allow any number of non-offical timezones (e.g. +1234)
>     passing this through seems fine. Git also won't break in the
>     future if e.g. French Polynesia decides it needs to outdo the Line
>     Islands when it comes to timezone extravagance.

Yeah, I think this is a good choice to loosen.

>  B. fsck allows missing author names such as "tagger <email>", mktag
>     wouldn't, but would allow e.g. "tagger <email>" (but not "tagger
>     <email>"). Now we allow all of these.

Likewise, though I am confused. Should the second "tagger <email>" in
that paragraph have something else in it?

>  C. Like B, but "mktag" disallowed spaces in the <email> part, fsck
>     allows it.

Possibly something we'd want to tighten in fsck, but I think keeping
them in alignment is a good idea for now.

> We didn't only lose obscure validation logic, we also gained some:
> 
>  D. fsck disallows zero-padded dates, but mktag didn't care. So
>     e.g. the timestamp "0000000000 +0000" produces an error now. A
>     test in "t1006-cat-file.sh" relied on this, it's been changed to
>     use "hash-object" (without fsck) instead.

Seems reasonable.

> +	/* verify_tag() will be removed in the next commit */
> +	verify_tag("", 0);
> +
> +	/*
> +	 * Fake up an object for fsck_object()
> +	 */
> +	obj.parsed = 1;
> +	obj.type = OBJ_TAG;

I don't love this "fake object struct on the stack" thing. I can't think
of anything that would break outright, but it may be the only place
where that struct isn't coming from the usual pool, and representing the
common part of a larger object. Two definite gotchas, though:

  - if the type is OBJ_TAG, then it may get cast to a "struct tag" by
    other code, which could look past the end of the struct. I think
    that fsck_object() doesn't do this, but it could (and it definitely
    used to)

  - you don't initialize the other fields. We'll definitely pass
    &obj->oid in the fsck code, and even back to our report() callback,
    even though it's full of garbage. In practice this is OK because our
    custom report() won't look at them, but it seems awfully fragile.

I recently genericized the type-specific fsck_* functions so that they
just need an oid, and not an object struct. I think I didn't do
fsck_object() just because it didn't have any callers where it mattered.
So I think it would make sense here to either:

  - make fsck_tag() specifically available outside of fsck.c, so could
    call it directly

  - convert fsck_object() to take an oid rather than an object struct.
    It only uses the object itself to check for NULL-ness. Looking at
    the callers, they all have non-NULL objects already. So I think that
    check can't be triggered.

>  check_verify_failure 'Tag object length check' \
> -	'^error: .*size wrong.*$'
> +	'^error: missingObject:'

We may want to enhance the "error:" here to make it clear we're talking
about a format error in the tag input. Maybe:

  error: tag input does not pass fsck: missingObject: ...

or something.

-Peff