Re: [PATCH v2] fast-import: add new --date-format=raw-permissive format

Jeff King <peff@xxxxxxxx> · Fri, 29 May 2020 02:13:46 -0400

On Thu, May 28, 2020 at 08:40:37PM +0000, Elijah Newren via GitGitGadget wrote:

>      * Instead of just allowing such timezones outright, did it behind a
>        --date-format=raw-permissive as suggested by Jonathan

Thanks, I like the safety this gives us against importer bugs. It does
mean that people doing "export | filter | import" may have to manually
loosen it, but it should be rare enough that this isn't a big burden
(and if they're rewriting anyway, maybe it gives them a chance to decide
how to fix it up).

>  fast-import.c          | 15 ++++++++++++---
>  t/t9300-fast-import.sh | 30 ++++++++++++++++++++++++++++++

Would we need a documentation update for "raw-permissive", too?

> @@ -1929,7 +1931,8 @@ static int validate_raw_date(const char *src, struct strbuf *result)
>  		return -1;
>  
>  	num = strtoul(src + 1, &endp, 10);
> -	if (errno || endp == src + 1 || *endp || 1400 < num)
> +	out_of_range_timezone = strict && (1400 < num);
> +	if (errno || endp == src + 1 || *endp || out_of_range_timezone)
>  		return -1;

It's a little funny to do computations on the result of a function
before we've checked whether it produced an error. But since the result
is just an integer, I don't think we'd do anything unexpected (we might
just throw away the value, though I imagine an optimizing compiler might
evaluate it lazily anyway).

> @@ -1963,7 +1966,11 @@ static char *parse_ident(const char *buf)
>  
>  	switch (whenspec) {
>  	case WHENSPEC_RAW:
> -		if (validate_raw_date(ltgt, &ident) < 0)
> +		if (validate_raw_date(ltgt, &ident, 1) < 0)
> +			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
> +		break;
> +	case WHENSPEC_RAW_PERMISSIVE:
> +		if (validate_raw_date(ltgt, &ident, 0) < 0)
>  			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
>  		break;

This looks simple enough. We have the bogus date in a buffer at that
point, and we stuff that string literally into the commit (or tag)
object. If we ever get more clever about storing the timestamp
internally as an integer, we may need to get more clever. But your new
test should alert us if that becomes the case.

> +test_expect_success 'B: accept invalid timezone with raw-permissive' '
> +	cat >input <<-INPUT_END &&
> +	commit refs/heads/invalid-timezone
> +	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
> +	data <<COMMIT
> +	empty commit
> +	COMMIT
> +	INPUT_END
> +
> +	test_when_finished "git update-ref -d refs/heads/invalid-timezone
> +		git gc
> +		git prune" &&

We check the exit code of when_finished commands, so this should be
&&-chained as usual. And possibly indented like:

  test_when_finished "
    git update-ref -d refs/heads/invalid-timezone &&
    git gc &&
    git prune
  " &&

but I see this all is copying another nearby test. So I'm OK with
keeping it consistent for now and cleaning up the whole thing later.
Though if you want to do that step now, I have no objection. :)

I also also suspect this "gc" is not useful these days. For a small
input like this, fast-import will write loose objects, since d9545c7f46
(fast-import: implement unpack limit, 2016-04-25).

TBH, I think it would be easier to understand as:

  ...
  git init invalid-timezone &&
  git -C invalid-timezone fast-import <input &&
  git -C invalid-timezone cat-file -p master >out &&
  ...

and don't bother with the when_finished at all. Then you don't have to
wonder whether the cleanup was sufficient, and it's fewer processes
to boot (we'll leave the sub-repo cruft sitting around, but a single "rm
-rf" at the end of the test script will wipe them all out).

-Peff