Re: [PATCH 1/2] tree-walk: be more specific about corrupt tree errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 26, 2016 at 03:32:44PM -0400, David Turner wrote:

> From: Jeff King <peff@xxxxxxxx>
> 
> When the tree-walker runs into an error, it just calls
> die(), and the message is always "corrupt tree file".
> However, we are actually covering several cases here; let's
> give the user a hint about what happened.
> 
> Let's also avoid using the word "corrupt", which makes it
> seem like the data bit-rotted on disk. Our sha1 check would
> already have found that. These errors are ones of data that
> is malformed in the first place.
> 
> Signed-off-by: David Turner <dturner@xxxxxxxxxxxx>
> Signed-off-by: Jeff King <peff@xxxxxxxx>

Yay. This has been on my "to look at and repost" list for literally 2
years. Thanks for picking it up (see kids, procrastination _does_ pay
off).

>  t/t1007-hash-object.sh           |  15 +++++++++++++--
>  t/t1007/tree-with-empty-filename | Bin 0 -> 28 bytes
>  t/t1007/tree-with-malformed-mode | Bin 0 -> 39 bytes

Ooh, and tests. Exciting.

> -test_expect_success 'corrupt tree' '
> +test_expect_success 'too-short tree' '
>  	echo abc >malformed-tree &&
> -	test_must_fail git hash-object -t tree malformed-tree
> +	test_must_fail git hash-object -t tree malformed-tree 2>err &&
> +	grep "too-short tree object" err
> +'

Should this be test_i18ngrep? Even if the message is not translated now,
it seems like a good proactive measure (and probably it _should_ be
translated).

> +test_expect_success 'malformed mode in tree' '
> +	test_must_fail git hash-object -t tree ../t1007/tree-with-malformed-mode 2>err &&
> +	grep "malformed mode in tree entry for tree" err
> +'

This ".." will break when the test is run with "--root". You should use

  "$TEST_DIRECTORY"/t1007/...

instead. And ditto in the second test, of course.

> diff --git a/t/t1007/tree-with-empty-filename b/t/t1007/tree-with-empty-filename
> new file mode 100644
> index 0000000000000000000000000000000000000000..aeb1ceb20e485eebd0acbb81c974d1c6fedcc1fe
> GIT binary patch
> literal 28
> kcmXpsFfcPQQDAsB_tET47q2;ccWbUIkGgT_Nl)-Z0Hx{;SO5S3
> 
> literal 0
> HcmV?d00001

This is rather opaque, of course. :)

I wonder if it would be possible to generate the test vector with
something like:

  # any 20 bytes will do
  bin_sha1='\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'

  printf "100644 \0$bin_sha1" >tree-with-empty-filename

I know that is longer and possibly more error-prone to run, but I think
it makes the test much easier to read and modify later.

I also wonder if $bin_sha1 should actually be more like:

  hex_sha1=$(echo foo | git hash-object --stdin -w)
  bin_sha1=$(echo $hex_sha1 | perl -ne 'printf "\\%3o", ord for /./g')

so that it's a real sha1 (or maybe it is in your original, from an
object that happens to be in the repo; it's hard to tell). I wouldn't
expect the code to actually get to the point of looking at the sha1, but
it's perhaps a more realistic test.

I also think it would be nice if hash-object had a "--binary-sha1"
option to avoid the perl grossness. :)

> diff --git a/tree-walk.c b/tree-walk.c
> index ce27842..ba544cf 100644

The code change itself looks brilliant, naturally. :)

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]