Re: [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 30, 2023 at 05:29:33PM -0400, Taylor Blau wrote:
> On Mon, Oct 23, 2023 at 01:27:16PM +0200, Patrick Steinhardt wrote:
> > Our `lookup_commit_in_graph()` helper tries to look up commits from the
> > commit graph and, if it doesn't exist there, falls back to parsing it
> > from the object database instead. This is intended to speed up the
> > lookup of any such commit that exists in the database. There is an edge
> > case though where the commit exists in the graph, but not in the object
> > database. To avoid returning such stale commits the helper function thus
> > double checks that any such commit parsed from the graph also exists in
> > the object database. This makes the function safe to use even when
> > commit graphs aren't updated regularly.
> >
> > We're about to introduce the same pattern into other parts of our code
> > base though, namely `repo_parse_commit_internal()`. Here the extra
> > sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
> > a newly introduced helper, and as such there was no performance hit by
> > adding this sanity check. If we added `repo_parse_commit_internal()`
> > with that sanity check right from the beginning as well, this would
> > probably never have been an issue to begin with. But by retrofitting it
> > with this sanity check now we do add a performance regression to
> > preexisting code, and thus there is a desire to avoid this or at least
> > give an escape hatch.
> >
> > In practice, there is no inherent reason why either of those functions
> > should have the sanity check whereas the other one does not: either both
> > of them are able to detect this issue or none of them should be. This
> > also means that the default of whether we do the check should likely be
> > the same for both. To err on the side of caution, we thus rather want to
> > make `repo_parse_commit_internal()` stricter than to loosen the checks
> > that we already have in `lookup_commit_in_graph()`.
> 
> All well reasoned. I think the most compelling reason is that we're
> already doing this extra check in lookup_commit_in_graph(), and having
> that be somewhat inconsistent with repo_parse_commit_internal() feels
> error-prone to me.
> 
> > The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
> > environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
> > the default, we will double check that commits looked up in the commit
> > graph via `lookup_commit_in_graph()` also exist in the object database.
> > This same check will also be added in `repo_parse_commit_internal()`.
> 
> Sounds good.
> 
> > Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> > ---
> >  Documentation/git.txt   |  9 +++++++++
> >  commit-graph.c          |  6 +++++-
> >  commit-graph.h          |  6 ++++++
> >  t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/git.txt b/Documentation/git.txt
> > index 11228956cd..22c2b537aa 100644
> > --- a/Documentation/git.txt
> > +++ b/Documentation/git.txt
> > @@ -911,6 +911,15 @@ for full details.
> >  	should not normally need to set this to `0`, but it may be
> >  	useful when trying to salvage data from a corrupted repository.
> >
> > +`GIT_COMMIT_GRAPH_PARANOIA`::
> > +	If this Boolean environment variable is set to false, ignore the
> > +	case where commits exist in the commit graph but not in the
> > +	object database. Normally, Git will check whether commits loaded
> > +	from the commit graph exist in the object database to avoid
> > +	issues with stale commit graphs, but this check comes with a
> > +	performance penalty. The default is `1` (i.e., be paranoid about
> > +	stale commits in the commit graph).
> > +
> 
> The first two sentences seem to be flipped. Perhaps:
> 
>     When loading a commit object from the commit-graph, Git will perform
>     an existence check on the object in the ODB before parsing it out of
>     the commit-graph. The default is "true", which enables the
>     aforementioned behavior. Setting this to "false" disables the
>     existential check when parsing commits from a commit-graph.

I was modelling this after the text we had in `GIT_REF_PARANOIA`, but I
like your version more indeed. I'll massage it a bit to mention _why_
one would want to disable this.

> >  `GIT_ALLOW_PROTOCOL`::
> >  	If set to a colon-separated list of protocols, behave as if
> >  	`protocol.allow` is set to `never`, and each of the listed
> > diff --git a/commit-graph.c b/commit-graph.c
> > index fd2f700b2e..12ec31902e 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
> >
> >  struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
> >  {
> > +	static int object_paranoia = -1;
> >  	struct commit *commit;
> >  	uint32_t pos;
> >
> > +	if (object_paranoia == -1)
> > +		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
> > +
> 
> I don't think that this is a reroll-able issue, but calling this
> variable object_paranoia to store a setting for *graph* paranoia feels
> like a good itch to scratch. But obviously not a big deal ;-).

Ugh, yeah. I first had the envvar as "GIT_OBJECT_PARANOIA", but
discarded that name because I feared that it might become overloaded
with semi-related checks.

Will fix.

> > @@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
> >  	)
> >  '
> >
> > +test_expect_success 'stale commit cannot be parsed when given directly' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		test_commit A &&
> > +		test_commit B &&
> > +		git commit-graph write --reachable &&
> > +
> > +		oid=$(git rev-parse B) &&
> > +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
> > +
> > +		# Verify that it is possible to read the commit from the
> > +		# commit graph when not being paranoid, ...
> > +		GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
> > +		# ... but parsing the commit when double checking that
> > +		# it actually exists in the object database should fail.
> > +		test_must_fail git rev-list -1 B
> 
> Would "cat-file -p" be more direct here than "rev-list -1"?

No, because it doesn't use `lookup_commit_in_graph()`. I had to go
searching a bit to find something that exposes this inconsistency, and
git-rev-list(1) was the easiest one I found.

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux