Re: [PATCH v3 16/20] commit-graph: verify contents match checksum

Jakub Narebski <jnareb@xxxxxxxxx> · Sat, 02 Jun 2018 17:52:40 +0200

Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes:

> The commit-graph file ends with a SHA1 hash of the previous contents. If
> a commit-graph file has errors but the checksum hash is correct, then we
> know that the problem is a bug in Git and not simply file corruption
> after-the-fact.
>
> Compute the checksum right away so it is the first error that appears,
> and make the message translatable since this error can be "corrected" by
> a user by simply deleting the file and recomputing. The rest of the
> errors are useful only to developers.

Should we then provide --quiet / --verbose options, so that ordinary
user is not flooded with error messages meant for power users and Git
developers, then?

>
> Be sure to continue checking the rest of the file data if the checksum
> is wrong. This is important for our tests, as we break the checksum as
> we modify bytes of the commit-graph file.

Well, we could have used sha1sum program, or test-sha1 helper to fix the
checksum after corrupting the commit-graph file...

>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  commit-graph.c          | 16 ++++++++++++++--
>  t/t5318-commit-graph.sh |  6 ++++++
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index d2b291aca2..a33600c584 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir,
>  	oids.nr = 0;
>  }
>  
> +#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
>  static int verify_commit_graph_error;
>  
>  static void graph_report(const char *fmt, ...)
> @@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...)
>  int verify_commit_graph(struct commit_graph *g)
>  {
>  	uint32_t i, cur_fanout_pos = 0;
> -	struct object_id prev_oid, cur_oid;
> +	struct object_id prev_oid, cur_oid, checksum;
> +	struct hashfile *f;
> +	int devnull;
>  
>  	if (!g) {
>  		graph_report("no commit-graph file loaded");
> @@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g)
>  	if (verify_commit_graph_error)
>  		return verify_commit_graph_error;
>  
> +	devnull = open("/dev/null", O_WRONLY);
> +	f = hashfd(devnull, NULL);
> +	hashwrite(f, g->data, g->data_len - g->hash_len);
> +	finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
> +	if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
> +		graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
> +		verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
> +	}

Is it the best way of calculating the SHA-1 checksum that out internal
APIs provide?  Is it how SHA-1 checksum is calculated and checked for
packfiles?

> +
>  	for (i = 0; i < g->num_commits; i++) {
>  		struct commit *graph_commit;
>  
> @@ -916,7 +928,7 @@ int verify_commit_graph(struct commit_graph *g)
>  		cur_fanout_pos++;
>  	}
>  
> -	if (verify_commit_graph_error)
> +	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>  		return verify_commit_graph_error;

So if we detected that checksum do not match, but we have not found an
error, we say that it is all right?

>  
>  	for (i = 0; i < g->num_commits; i++) {
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 240aef6add..2680a2ebff 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16`
>  GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \
>  				$GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS`
>  GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4`
> +GRAPH_BYTE_FOOTER=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES`
>  
>  # usage: corrupt_graph_and_verify <position> <data> <string>
>  # Manipulates the commit-graph file at the position
> @@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopus merge' '
>  		"invalid parent"
>  '
>  
> +test_expect_success 'detect invalid checksum hash' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> +		"incorrect checksum"

This would not work under GETTEXT_POISON, as the message is marked as
translatable, but corrupt_graph_and_verify uses 'grep' and not
'test_i18grep' from t/test-lib-functions.sh

> +'

If it is pure checksum corruption, wouldn't this fail because it is not
a failure (exit code is 0)?

> +
>  test_done