Re: [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> writes:

> Commit graphs can become stale and contain references to commits that do
> not exist in the object database anymore. Theoretically, this can lead
> to a scenario where we are able to successfully look up any such commit
> via the commit graph even though such a lookup would fail if done via
> the object database directly.
>
> As the commit graph is mostly intended as a sort of cache to speed up
> parsing of commits we do not want to have diverging behaviour in a
> repository with and a repository without commit graphs, no matter
> whether they are stale or not. As commits are otherwise immutable, the
> only thing that we really need to care about is thus the presence or
> absence of a commit.
>
> To address potentially stale commit data that may exist in the graph,
> our `lookup_commit_in_graph()` function will check for the commit's
> existence in both the commit graph, but also in the object database. So
> even if we were able to look up the commit's data in the graph, we would
> still pretend as if the commit didn't exist if it is missing in the
> object database.
>
> We don't have the same safety net in `parse_commit_in_graph_one()`
> though. This function is mostly used internally in "commit-graph.c"
> itself to validate the commit graph, and this usage is fine. We do
> expose its functionality via `parse_commit_in_graph()` though, which
> gets called by `repo_parse_commit_internal()`, and that function is in
> turn used in many places in our codebase.
>
> For all I can see this function is never used to directly turn an object
> ID into a commit object without additional safety checks before or after
> this lookup. What it is being used for though is to walk history via the
> parent chain of commits. So when commits in the parent chain of a graph
> walk are missing it is possible that we wouldn't notice if that missing
> commit was part of the commit graph. Thus, a query like `git rev-parse
> HEAD~2` can succeed even if the intermittent commit is missing.
>
> It's unclear whether there are additional ways in which such stale
> commit graphs can lead to problems. In any case, it feels like this is a
> bigger bug waiting to happen when we gain additional direct or indirect
> callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
> behaviour by checking for object existence via the object database, as
> well.
>
> This check of course comes with a performance penalty. The following
> benchmarks have been executed in a clone of linux.git with stable tags
> added:
>
>     Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
>       Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
>       Range (min … max):    2.894 s …  2.950 s    10 runs
>
>     Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>       Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
>       Range (min … max):    3.780 s …  3.961 s    10 runs
>
>     Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
>       Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
>       Range (min … max):   13.714 s … 13.995 s    10 runs
>
>     Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>       Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
>       Range (min … max):   13.645 s … 14.038 s    10 runs
>
>     Summary
>       git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
>         1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>         4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
>         4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
>
> We look at a ~30% regression in general, but in general we're still a
> whole lot faster than without the commit graph. To counteract this, the
> new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.

Very nicely described.  Will queue.  I'll go offline for the rest of
the week but if there are no significant issues discovered by the
time I come back, let's declare a victory and merge these two
patches down to 'next'.

Thanks.

>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  commit.c                | 16 +++++++++++++++-
>  t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/commit.c b/commit.c
> index b3223478bc..7399e90212 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -28,6 +28,7 @@
>  #include "shallow.h"
>  #include "tree.h"
>  #include "hook.h"
> +#include "parse.h"
>  
>  static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
>  
> @@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
>  		return -1;
>  	if (item->object.parsed)
>  		return 0;
> -	if (use_commit_graph && parse_commit_in_graph(r, item))
> +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
> +		static int object_paranoia = -1;
> +
> +		if (object_paranoia == -1)
> +			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +
> +		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
> +			unparse_commit(r, &item->object.oid);
> +			return quiet_on_missing ? -1 :
> +				error(_("commit %s exists in commit-graph but not in the object database"),
> +				      oid_to_hex(&item->object.oid));
> +		}
> +
>  		return 0;
> +	}
>  
>  	if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
>  		return quiet_on_missing ? -1 :
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index c0cc454538..79467d7926 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
>  	)
>  '
>  
> +test_expect_success 'stale commit cannot be parsed when traversing graph' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		test_commit A &&
> +		test_commit B &&
> +		test_commit C &&
> +		git commit-graph write --reachable &&
> +
> +		# Corrupt the repository by deleting the intermittent commit
> +		# object. Commands should notice that this object is absent and
> +		# thus that the repository is corrupt even if the commit graph
> +		# exists.
> +		oid=$(git rev-parse B) &&
> +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> +		# Again, we should be able to parse the commit when not
> +		# being paranoid about commit graph staleness...
> +		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
> +		# ... but fail when we are paranoid.
> +		test_must_fail git rev-parse HEAD~2 2>error &&
> +		grep "error: commit $oid exists in commit-graph but not in the object database" error
> +	)
> +'
> +
>  test_done





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux