Re: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> writes:

> @@ -572,8 +572,13 @@ int repo_parse_commit_internal(struct repository *r,
>  		return -1;
>  	if (item->object.parsed)
>  		return 0;
> -	if (use_commit_graph && parse_commit_in_graph(r, item))
> +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
> +		if (!has_object(r, &item->object.oid, 0))
> +			return quiet_on_missing ? -1 :
> +				error(_("commit %s exists in commit-graph but not in the object database"),
> +				      oid_to_hex(&item->object.oid));
>  		return 0;
> +	}

Ever since this codepath was introduced by 177722b3 (commit:
integrate commit graph with commit parsing, 2018-04-10), we blindly
trusted what commit-graph file says.  This change is a strict
improvement in the correctness department, but there are two things
that are a bit worrying.

One.  The additional check should certainly be cheaper than a full
reading and parsing of an object, either from a loose object file or
from a pack entry.  It may not hurt performance too much, but it
still would give us more confidence if we know by how much we are
pessimising good cases where the commit-graph does match reality.
Our stance on these secondary files that store precomputed values
for optimization purposes is in general to use them blindly unless
in exceptional cases where the operation values the correctness even
when the validity of these secondary files is dubious (e.g., "fsck"),
and doing this extra check regardless of the caller at this low level
of the callchain is a bit worrying.

Another is that by the time parse_commit_in_graph() returns true and
we realize that the answer we got is bogus by asking has_object(),
item->object.parsed has already been toggled on, so the caller now
has a commit object that claimed it was already parsed and does not
match reality.  Hopefully the caller takes an early exit upon seeing
a failure from parse_commit_gently() and the .parsed bit does not
matter, but maybe I am missing a case where it does.  I dunno.

Other than that, sounds very sensible and the code change is clean.

Will queue.  Thanks.

> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index ba65f17dd9..25f8e9e2d3 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -821,4 +821,27 @@ test_expect_success 'overflow during generation version upgrade' '
>  	)
>  '
>  
> +test_expect_success 'commit exists in commit-graph but not in object database' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		test_commit A &&
> +		test_commit B &&
> +		test_commit C &&
> +		git commit-graph write --reachable &&
> +
> +		# Corrupt the repository by deleting the intermittent commit
> +		# object. Commands should notice that this object is absent and
> +		# thus that the repository is corrupt even if the commit graph
> +		# exists.
> +		oid=$(git rev-parse B) &&
> +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> +		test_must_fail git rev-parse HEAD~2 2>error &&
> +		grep "error: commit $oid exists in commit-graph but not in the object database" error
> +	)
> +'
> +
>  test_done




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux