Re: [PATCH v3 17/20] fsck: verify commit-graph

Jakub Narebski <jnareb@xxxxxxxxx> · Sat, 02 Jun 2018 18:17:36 +0200

Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes:

> If core.commitGraph is true, verify the contents of the commit-graph
> during 'git fsck' using the 'git commit-graph verify' subcommand. Run
> this check on all alternates, as well.

All right, so we have one config variable to control the use of
serialized commit-graph feaature.  Nice.

>
> We use a new process for two reasons:
>
> 1. The subcommand decouples the details of loading and verifying a
>    commit-graph file from the other fsck details.

All right, I can agree with that.

On the other hand using subcommand makes debugging harder, though not in
this case (well separated functionality that can be easily called with a
standalone command to be debugged).

>
> 2. The commit-graph verification requires the commits to be loaded
>    in a specific order to guarantee we parse from the commit-graph
>    file for some objects and from the object database for others.

I don't quite understand this.  Could you explain it in more detail?

>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  Documentation/git-fsck.txt |  3 +++
>  builtin/fsck.c             | 21 +++++++++++++++++++++
>  t/t5318-commit-graph.sh    |  8 ++++++++
>  3 files changed, 32 insertions(+)
>
> diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
> index b9f060e3b2..ab9a93fb9b 100644
> --- a/Documentation/git-fsck.txt
> +++ b/Documentation/git-fsck.txt
> @@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
>  (i.e., you can just remove them and do an 'rsync' with some other site in
>  the hopes that somebody else has the object you have corrupted).
>  
> +If core.commitGraph is true, the commit-graph file will also be inspected

Shouldn't we use `core.commitGraph` here?

> +using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
> +
>  Extracted Diagnostics
>  ---------------------
>  
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index ef78c6c00c..a6d5045b77 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -16,6 +16,7 @@
>  #include "streaming.h"
>  #include "decorate.h"
>  #include "packfile.h"
> +#include "run-command.h"
>  
>  #define REACHABLE 0x0001
>  #define SEEN      0x0002
> @@ -45,6 +46,7 @@ static int name_objects;
>  #define ERROR_REACHABLE 02
>  #define ERROR_PACK 04
>  #define ERROR_REFS 010
> +#define ERROR_COMMIT_GRAPH 020

Minor nitpick and a sidenote: I wonder if it wouldn't be better to
either use hexadecimal constants, or use (1 << n) for all ERROR_*
preprocesor constants.

>  
>  static const char *describe_object(struct object *obj)
>  {
> @@ -815,5 +817,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>  	}
>  
>  	check_connectivity();
> +
> +	if (core_commit_graph) {
> +		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
> +		const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL, NULL };

I see that NULL at index 2 and 3 (at 3rd and 4th place) are here for
"--object-dir" and <alternates-object-dir-path>, the last one is
terminator for that case, but what is next to last NULL (at 5th place)
for?

> +		commit_graph_verify.argv = verify_argv;
> +		commit_graph_verify.git_cmd = 1;
> +
> +		if (run_command(&commit_graph_verify))
> +			errors_found |= ERROR_COMMIT_GRAPH;
> +
> +		prepare_alt_odb();
> +		for (alt = alt_odb_list; alt; alt = alt->next) {
> +			verify_argv[2] = "--object-dir";
> +			verify_argv[3] = alt->path;
> +			if (run_command(&commit_graph_verify))
> +				errors_found |= ERROR_COMMIT_GRAPH;
> +		}
> +	}

For performance reasons it may be better to start those 'git
commit-graph verify' commands asynchronously earlier, so that they can
run in parallel / concurrently wth other checks, and wait for them and
get their error code at the end of git-fsck run.

But that is probably better left for a separate commit.

> +
>  	return errors_found;
>  }
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 2680a2ebff..4941937163 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -394,4 +394,12 @@ test_expect_success 'detect invalid checksum hash' '
>  		"incorrect checksum"
>  '
>  
> +test_expect_success 'git fsck (checks commit-graph)' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	git fsck &&
> +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> +		"incorrect checksum" &&
> +	test_must_fail git fsck
> +'

All right; though the same caveats apply as with previous commit in
series.  Perhaps it would be better to truncate commit-graph file, or
corrupt it in some 'random' place.

> +
>  test_done

Best,
-- 
Jakub Narębski