Re: [PATCH 02/15] run-job: implement commit-graph job

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for this series, and sorry for the delayed (and partial) review.
I hope comments here are still useful. I don't have any comments on the
larger design that other people haven't already made, but I do have a
few small questions and suggestions, inline below.


On 2020.04.03 20:48, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> 
> The first job to implement in the 'git run-job' command is the
> 'commit-graph' job. It is based on the sequence of events in the
> 'commit-graph' job in Scalar [1]. This sequence is as follows:
> 
> 1. git commit-graph write --reachable --split
> 2. git commit-graph verify --shallow
> 3. If the verify succeeds, stop.
> 4. Delete the commit-graph-chain file.
> 5. git commit-graph write --reachable --split
> 
> By writing an incremental commit-graph file using the "--split"
> option we minimize the disruption from this operation. The default
> behavior is to merge layers until the new "top" layer is less than
> half the size of the layer below. This provides quick writes "most"
> of the time, with the longer writes following a power law
> distribution.
> 
> Most importantly, concurrent Git processes only look at the
> commit-graph-chain file for a very short amount of time, so they
> will verly likely not be holding a handle to the file when we try
> to replace it. (This only matters on Windows.)
> 
> If a concurrent process reads the old commit-graph-chain file, but
> our job expires some of the .graph files before they can be read,
> then those processes will see a warning message (but not fail).
> This could be avoided by a future update to use the --expire-time
> argument when writing the commit-graph.
> 
> By using 'git commit-graph verify --shallow' we can ensure that
> the file we just wrote is valid. This is an extra safety precaution
> that is faster than our 'write' subcommand. In the rare situation
> that the newest layer of the commit-graph is corrupt, we can "fix"
> the corruption by deleting the commit-graph-chain file and rewrite
> the full commit-graph as a new one-layer commit graph. This does
> not completely prevent _that_ file from being corrupt, but it does
> recompute the commit-graph by parsing commits from the object
> database. In our use of this step in Scalar and VFS for Git, we
> have only seen this issue arise because our microsoft/git fork
> reverted 43d3561 ("commit-graph write: don't die if the existing
> graph is corrupt" 2019-03-25) for a while to keep commit-graph
> writes very fast. We dropped the revert when updating to v2.23.0.
> The verify still has potential for catching corrupt data across
> the layer boundary: if the new file has commit X with parent Y
> in an old file but the commit ID for Y in the old file had a
> bitswap, then we will notice that in the 'verify' command.
> 
> [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/CommitGraphStep.cs
> 
> RFC QUESTIONS:
> 
> 1. Are links like [1] helpful?

Yes, I appreciate being able to reference these. However, given that
these will show up in commit logs for ~forever, and ongoing work might
happen on Scalar, perhaps the links should be pinned to a specific
revision?


> 
> 2. Can anyone think of a way to test the rewrite fallback?
>    It requires corrupting the latest file between two steps of
>    this one command, so that is a tricky spot to inject a fault.
> 
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  Documentation/git-run-job.txt | 21 ++++++++++--
>  builtin/run-job.c             | 60 ++++++++++++++++++++++++++++++++++-
>  commit-graph.c                |  2 +-
>  commit-graph.h                |  1 +
>  t/t7900-run-job.sh            | 32 +++++++++++++++++++
>  5 files changed, 112 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/git-run-job.txt b/Documentation/git-run-job.txt
> index 0627b3ed259..8bf2762d650 100644
> --- a/Documentation/git-run-job.txt
> +++ b/Documentation/git-run-job.txt
> @@ -9,7 +9,7 @@ git-run-job - Run a maintenance job. Intended for background operation.
>  SYNOPSIS
>  --------
>  [verse]
> -'git run-job <job-name>'
> +'git run-job commit-graph'
>  
>  
>  DESCRIPTION
> @@ -28,7 +28,24 @@ BEHAVIOR MAY BE ALTERED IN THE FUTURE.
>  JOBS
>  ----
>  
> -TBD
> +'commit-graph'::
> +
> +The `commit-graph` job updates the `commit-graph` files incrementally,
> +then verifies that the written data is correct. If the new layer has an
> +issue, then the chain file is removed and the `commit-graph` is
> +rewritten from scratch.
> ++
> +The verification only checks the top layer of the `commit-graph` chain.
> +If the incremental write merged the new commits with at least one
> +existing layer, then there is potential for on-disk corruption being
> +carried forward into the new file. This will be noticed and the new
> +commit-graph file will be clean as Git reparses the commit data from
> +the object database.
> ++
> +The incremental write is safe to run alongside concurrent Git processes
> +since it will not expire `.graph` files that were in the previous
> +`commit-graph-chain` file. They will be deleted by a later run based on
> +the expiration delay.
>  
>  
>  GIT
> diff --git a/builtin/run-job.c b/builtin/run-job.c
> index 2c78d053aa4..dd7709952d3 100644
> --- a/builtin/run-job.c
> +++ b/builtin/run-job.c
> @@ -1,12 +1,65 @@
>  #include "builtin.h"
>  #include "config.h"
> +#include "commit-graph.h"
> +#include "object-store.h"
>  #include "parse-options.h"
> +#include "repository.h"
> +#include "run-command.h"
>  
>  static char const * const builtin_run_job_usage[] = {
> -	N_("git run-job"),
> +	N_("git run-job commit-graph"),
>  	NULL
>  };
>  
> +static int run_write_commit_graph(void)
> +{
> +	struct argv_array cmd = ARGV_ARRAY_INIT;
> +
> +	argv_array_pushl(&cmd, "commit-graph", "write",
> +			 "--split", "--reachable", "--no-progress",
> +			 NULL);
> +
> +	return run_command_v_opt(cmd.argv, RUN_GIT_CMD);
> +}
> +
> +static int run_verify_commit_graph(void)
> +{
> +	struct argv_array cmd = ARGV_ARRAY_INIT;
> +
> +	argv_array_pushl(&cmd, "commit-graph", "verify",
> +			 "--shallow", "--no-progress", NULL);
> +
> +	return run_command_v_opt(cmd.argv, RUN_GIT_CMD);
> +}
> +
> +static int run_commit_graph_job(void)
> +{
> +	char *chain_path;
> +
> +	if (run_write_commit_graph()) {
> +		error(_("failed to write commit-graph"));
> +		return 1;
> +	}
> +
> +	if (!run_verify_commit_graph())
> +		return 0;
> +
> +	warning(_("commit-graph verify caught error, rewriting"));
> +
> +	chain_path = get_chain_filename(the_repository->objects->odb);

Should we avoid new uses of `the_repository` and take a repo pointer
argument here instead?


> +	if (unlink(chain_path)) {
> +		UNLEAK(chain_path);
> +		die(_("failed to remove commit-graph at %s"), chain_path);
> +	}
> +	free(chain_path);
> +
> +	if (!run_write_commit_graph())
> +		return 0;

Is there a reason why we don't verify the second write attempt?


> +
> +	error(_("failed to rewrite commit-graph"));
> +	return 1;
> +}
> +
>  int cmd_run_job(int argc, const char **argv, const char *prefix)
>  {
>  	static struct option builtin_run_job_options[] = {
> @@ -23,6 +76,11 @@ int cmd_run_job(int argc, const char **argv, const char *prefix)
>  			     builtin_run_job_usage,
>  			     PARSE_OPT_KEEP_UNKNOWN);
>  
> +	if (argc > 0) {
> +		if (!strcmp(argv[0], "commit-graph"))
> +			return run_commit_graph_job();
> +	}
> +
>  	usage_with_options(builtin_run_job_usage,
>  			   builtin_run_job_options);
>  }
> diff --git a/commit-graph.c b/commit-graph.c
> index f013a84e294..6867f92d04a 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -56,7 +56,7 @@ static char *get_split_graph_filename(struct object_directory *odb,
>  		       oid_hex);
>  }
>  
> -static char *get_chain_filename(struct object_directory *odb)
> +char *get_chain_filename(struct object_directory *odb)
>  {
>  	return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
>  }
> diff --git a/commit-graph.h b/commit-graph.h
> index e87a6f63600..8b7bd5a6cb1 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -13,6 +13,7 @@
>  struct commit;
>  
>  char *get_commit_graph_filename(struct object_directory *odb);
> +char *get_chain_filename(struct object_directory *odb);
>  int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
>  
>  /*
> diff --git a/t/t7900-run-job.sh b/t/t7900-run-job.sh
> index 1eac80b7ed3..18b9bd26b3a 100755
> --- a/t/t7900-run-job.sh
> +++ b/t/t7900-run-job.sh
> @@ -5,6 +5,8 @@ test_description='git run-job
>  Testing the background jobs, in the foreground
>  '
>  
> +GIT_TEST_COMMIT_GRAPH=0
> +
>  . ./test-lib.sh
>  
>  test_expect_success 'help text' '
> @@ -12,4 +14,34 @@ test_expect_success 'help text' '
>  	test_i18ngrep "usage: git run-job" err
>  '
>  
> +test_expect_success 'commit-graph job' '
> +	git init server &&
> +	(
> +		cd server &&
> +		chain=.git/objects/info/commit-graphs/commit-graph-chain &&
> +		git checkout -b master &&
> +		for i in $(test_seq 1 15)
> +		do
> +			test_commit $i || return 1
> +		done &&
> +		git run-job commit-graph 2>../err &&
> +		test_must_be_empty ../err &&
> +		test_line_count = 1 $chain &&
> +		for i in $(test_seq 16 19)
> +		do
> +			test_commit $i || return 1
> +		done &&
> +		git run-job commit-graph 2>../err &&
> +		test_must_be_empty ../err &&
> +		test_line_count = 2 $chain &&
> +		for i in $(test_seq 20 23)
> +		do
> +			test_commit $i || return 1
> +		done &&
> +		git run-job commit-graph 2>../err &&
> +		test_must_be_empty ../err &&
> +		test_line_count = 1 $chain
> +	)
> +'
> +
>  test_done
> -- 
> gitgitgadget
> 



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux