Re: [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 23, 2023 at 01:27:16PM +0200, Patrick Steinhardt wrote:
> Our `lookup_commit_in_graph()` helper tries to look up commits from the
> commit graph and, if it doesn't exist there, falls back to parsing it
> from the object database instead. This is intended to speed up the
> lookup of any such commit that exists in the database. There is an edge
> case though where the commit exists in the graph, but not in the object
> database. To avoid returning such stale commits the helper function thus
> double checks that any such commit parsed from the graph also exists in
> the object database. This makes the function safe to use even when
> commit graphs aren't updated regularly.
>
> We're about to introduce the same pattern into other parts of our code
> base though, namely `repo_parse_commit_internal()`. Here the extra
> sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
> a newly introduced helper, and as such there was no performance hit by
> adding this sanity check. If we added `repo_parse_commit_internal()`
> with that sanity check right from the beginning as well, this would
> probably never have been an issue to begin with. But by retrofitting it
> with this sanity check now we do add a performance regression to
> preexisting code, and thus there is a desire to avoid this or at least
> give an escape hatch.
>
> In practice, there is no inherent reason why either of those functions
> should have the sanity check whereas the other one does not: either both
> of them are able to detect this issue or none of them should be. This
> also means that the default of whether we do the check should likely be
> the same for both. To err on the side of caution, we thus rather want to
> make `repo_parse_commit_internal()` stricter than to loosen the checks
> that we already have in `lookup_commit_in_graph()`.

All well reasoned. I think the most compelling reason is that we're
already doing this extra check in lookup_commit_in_graph(), and having
that be somewhat inconsistent with repo_parse_commit_internal() feels
error-prone to me.

> The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
> environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
> the default, we will double check that commits looked up in the commit
> graph via `lookup_commit_in_graph()` also exist in the object database.
> This same check will also be added in `repo_parse_commit_internal()`.

Sounds good.

> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  Documentation/git.txt   |  9 +++++++++
>  commit-graph.c          |  6 +++++-
>  commit-graph.h          |  6 ++++++
>  t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
>  4 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 11228956cd..22c2b537aa 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -911,6 +911,15 @@ for full details.
>  	should not normally need to set this to `0`, but it may be
>  	useful when trying to salvage data from a corrupted repository.
>
> +`GIT_COMMIT_GRAPH_PARANOIA`::
> +	If this Boolean environment variable is set to false, ignore the
> +	case where commits exist in the commit graph but not in the
> +	object database. Normally, Git will check whether commits loaded
> +	from the commit graph exist in the object database to avoid
> +	issues with stale commit graphs, but this check comes with a
> +	performance penalty. The default is `1` (i.e., be paranoid about
> +	stale commits in the commit graph).
> +

The first two sentences seem to be flipped. Perhaps:

    When loading a commit object from the commit-graph, Git will perform
    an existence check on the object in the ODB before parsing it out of
    the commit-graph. The default is "true", which enables the
    aforementioned behavior. Setting this to "false" disables the
    existential check when parsing commits from a commit-graph.

>  `GIT_ALLOW_PROTOCOL`::
>  	If set to a colon-separated list of protocols, behave as if
>  	`protocol.allow` is set to `never`, and each of the listed
> diff --git a/commit-graph.c b/commit-graph.c
> index fd2f700b2e..12ec31902e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
>
>  struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
>  {
> +	static int object_paranoia = -1;
>  	struct commit *commit;
>  	uint32_t pos;
>
> +	if (object_paranoia == -1)
> +		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> +

I don't think that this is a reroll-able issue, but calling this
variable object_paranoia to store a setting for *graph* paranoia feels
like a good itch to scratch. But obviously not a big deal ;-).

> @@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
>  	)
>  '
>
> +test_expect_success 'stale commit cannot be parsed when given directly' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit A &&
> +		test_commit B &&
> +		git commit-graph write --reachable &&
> +
> +		oid=$(git rev-parse B) &&
> +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> +
> +		# Verify that it is possible to read the commit from the
> +		# commit graph when not being paranoid, ...
> +		GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
> +		# ... but parsing the commit when double checking that
> +		# it actually exists in the object database should fail.
> +		test_must_fail git rev-list -1 B

Would "cat-file -p" be more direct here than "rev-list -1"?

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux