Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers

Jeff King <peff@xxxxxxxx> · Fri, 24 Jun 2016 15:07:44 -0400

On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > The ustar format only has room for 11 (or 12, depending on
> > some implementations) octal digits for the size and mtime of
> > each file. After this, we have to add pax extended headers
> > to specify the real data, and git does not yet know how to
> > do so.
> 
> I am not a native speaker but "After" above made me hiccup.  I think
> I am correct to understand that it means "after passing this limit",
> aka "to represent files bigger or newer than these", but still it
> felt somewhat strange.

Yeah, I agree that it reads badly. I'm not sure what I was thinking.
I'll tweak it in the re-roll.

> > +# See if our system tar can handle a tar file with huge sizes and dates far in
> > +# the future, and that we can actually parse its output.
> > +#
> > +# The reference file was generated by GNU tar, and the magic time and size are
> > +# both octal 01000000000001, which overflows normal ustar fields.
> > +#
> > +# When parsing, we'll pull out only the year from the date; that
> > +# avoids any question of timezones impacting the result. 
> 
> ... as long as the month-day part is not close to the year boundary.
> So this explanation is insuffucient to convince the reader that
> "that avoids any question" is correct, without saying that it is in
> August of year 4147.

I thought that part didn't need to be said, but I can say it
(technically we can include the month, too, but I don't think that level
of accuracy is really important for these tests).

> > +tar_info () {
> > +	"$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
> > +}
> 
> A blank after the shell function to make it easier to see the
> boundary.

I was intentionally trying to couple it with prereq below, as the
comment describes both of them.

> Seeing an awk piped into cut always makes me want to suggest a
> single sed/awk/perl invocation.

I want the auto-splitting of awk, but then to auto-split the result
using a different delimiter. Is there a not-painful way to do that in
awk?

I could certainly come up with a regex to do it in sed, but I wanted to
keep the parsing as liberal and generic as possible.

Certainly I could do it in perl, but I had the general impression that
we prefer to keep the dependency on perl to a minimum. Maybe it doesn't
matter.

> > +# We expect git to die with SIGPIPE here (otherwise we
> > +# would generate the whole 64GB).
> > +test_expect_failure BUNZIP 'generate tar with huge size' '
> > +	{
> > +		git archive HEAD
> > +		echo $? >exit-code
> > +	} | head -c 4096 >huge.tar &&
> > +	echo 141 >expect &&
> > +	test_cmp expect exit-code
> > +'
> 
> "head -c" is GNU-ism, isn't it?

You're right; for some reason I thought it was in POSIX.

We do have a couple instances of it, but they are all in the valgrind
setup code (which I guess most people don't ever run).

> "dd bs=1 count=4096" is hopefully more portable.

Hmm. I always wonder whether dd is actually very portable, but we do use
it already, at least.

Perhaps the perl monstrosity in t9300 could be replaced with that, too.

> ksh signal death you already know about.  I wonder if we want to
> expose something like list_contains as a friend of test_cmp.
> 
> 	list_contains 141,269 $(cat exit-code)

I think we would want something more like:

  test_signal_match 13 $(cat exit-code)

Each call site should not have to know about every signal convention
(and in your example, the magic "3" of Windows is left out).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html