Re: Is fetch.writeCommitGraph (and thus features.experimental) meant to work in the presence of shallow clones?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 15, 2020 at 07:55:19AM -0400, Derrick Stolee wrote:
> On 4/14/2020 7:50 PM, Taylor Blau wrote:
> > On Tue, Apr 14, 2020 at 04:31:19PM -0400, Derrick Stolee wrote:
> >> On 4/14/2020 4:22 PM, Elijah Newren wrote:
> >>> Hi,
> >>>
> >>> I was building a version of git for internal use, and thought I'd try
> >>> turning on features.experimental to get more testing of it.  The
> >>> following test error in the testsuite scared me, though:
> >>>
> >>> t5537.9 (fetch --update-shallow):
> >>>
> >>> ...
> >>> + git fetch --update-shallow ../shallow/.git refs/heads/*:refs/remotes/shallow/*
> [snip]
> >> Well, commit-graphs are not supposed to do anything if we have
> >> shallow clones. We definitely don't load a commit-graph in that
> >> case. Seems like we need an extra check in write_commit_graph()
> >> to stop writing in the presence of shallow commits.
>
> Here, I assumed that the commit-graph wasn't checking the shallow status
> appropriately, but...
>
> > This rang a bell to me, too. There's a bug, but it's due to the mutative
> > side-effects of 'is_repository_shallow' along with '--update-shallow' (a
> > normal 'git fetch' works fine here, with or without
> > fetch.writeCommitGraph).
>
> ...this makes sense as to why this particular case failed.
>
> (Please allow me this brief moment to communicate my extreme dislike
> of shallow clones. There, I'm done.)

Noted ;).

> > Here's a patch that I didn't sign-off on that fixes the problem for me.
> >
> > --- >8 ---
> >
> > Subject: [PATCH] shallow.c: use 'reset_repository_shallow' when appropriate
> >
> > In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
> > 2019-01-10), the author noted that 'is_repository_shallow' produces
> > visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.
> >
> > This is a problem for e.g., fetching with '--update-shallow' in a
> > shallow repsoitory with 'fetch.writeCommitGraph' enabled, since the
>
> repository
>
> > update to '.git/shallow' will cause Git to think that the repository
> > *isn't* shallow when it is, thereby circumventing the commit-graph
> > compatability check.
>
> compatibility

Whoops, thanks for pointing these out.

> > This causes problems in shallow repositories with at least shallow refs
> > that have at least one ancestor (since the client won't have those
> > object(s), and therefore can't take the reachability closure over
> > commits to be written to the commit-graph).
> >
> > Address this by introducing 'reset_repository_shallow()', and calling it
> > when the shallow file is updated, forcing 'is_repository_shallow' to
> > re-evaluate whether the repository is still shallow after fetching in
> > the above scenario.
> >
> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> > ---
> >  commit.h     |  1 +
> >  fetch-pack.c |  1 +
> >  shallow.c    | 15 ++++++++-------
> >  3 files changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/commit.h b/commit.h
> > index 008a0fa4a0..ee1ba139d4 100644
> > --- a/commit.h
> > +++ b/commit.h
> > @@ -251,6 +251,7 @@ int register_shallow(struct repository *r, const struct object_id *oid);
> >  int unregister_shallow(const struct object_id *oid);
> >  int for_each_commit_graft(each_commit_graft_fn, void *);
> >  int is_repository_shallow(struct repository *r);
> > +void reset_repository_shallow(struct repository *r);
> >  struct commit_list *get_shallow_commits(struct object_array *heads,
> >  					int depth, int shallow_flag, int not_shallow_flag);
> >  struct commit_list *get_shallow_commits_by_rev_list(
> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 1734a573b0..051902ef6d 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -1630,6 +1630,7 @@ static void update_shallow(struct fetch_pack_args *args,
> >  		if (*alternate_shallow_file == '\0') { /* --unshallow */
> >  			unlink_or_warn(git_path_shallow(the_repository));
> >  			rollback_lock_file(&shallow_lock);
> > +			reset_repository_shallow(the_repository);
> >  		} else
> >  			commit_lock_file(&shallow_lock);
> >  		alternate_shallow_file = NULL;
> > diff --git a/shallow.c b/shallow.c
> > index 7fd04afed1..fac383dec9 100644
> > --- a/shallow.c
> > +++ b/shallow.c
> > @@ -40,13 +40,6 @@ int register_shallow(struct repository *r, const struct object_id *oid)
> >
> >  int is_repository_shallow(struct repository *r)
> >  {
> > -	/*
> > -	 * NEEDSWORK: This function updates
> > -	 * r->parsed_objects->{is_shallow,shallow_stat} as a side effect but
> > -	 * there is no corresponding function to clear them when the shallow
> > -	 * file is updated.
> > -	 */
> > -
>
> It's always good to remove these NEEDSWORK comments. I wonder if the
> problem is more complicated than your patch makes it seem, or else
> the original author would have done it correctly at first.
>
> But you are definitely closing out one dangling side-effect, which is
> an improvement.

Yeah, I have no idea if there are other spots that I'm missing (I only
spent a few minutes on this patch before leaving for the day to go eat
dinner), hence why I CC'd Jonathan Tan to see if he had anything else in
mind when he wrote this 'NEEDSWORK' comment.

If he feels that this is a generally good direction, I'm more than happy
to look for other spots myself, address them, and then submit this as a
real patch.

> >  	FILE *fp;
> >  	char buf[1024];
> >  	const char *path = r->parsed_objects->alternate_shallow_file;
> > @@ -79,6 +72,12 @@ int is_repository_shallow(struct repository *r)
> >  	return r->parsed_objects->is_shallow;
> >  }
> >
> > +void reset_repository_shallow(struct repository *r)
> > +{
> > +	r->parsed_objects->is_shallow = -1;
> > +	stat_validity_clear(r->parsed_objects->shallow_stat);
> > +}
> > +
> >  /*
> >   * TODO: use "int" elemtype instead of "int *" when/if commit-slab
> >   * supports a "valid" flag.
> > @@ -362,6 +361,7 @@ void setup_alternate_shallow(struct lock_file *shallow_lock,
> >  		 * shallow file".
> >  		 */
> >  		*alternate_shallow_file = "";
> > +	reset_repository_shallow(the_repository);
> >  	strbuf_release(&sb);
> >  }
> >
> > @@ -411,6 +411,7 @@ void prune_shallow(unsigned options)
> >  			die_errno("failed to write to %s",
> >  				  get_lock_file_path(&shallow_lock));
> >  		commit_lock_file(&shallow_lock);
> > +		reset_repository_shallow(the_repository);
> >  	} else {
>
> These are likely good places to call reset_repository_shallow(),
> but perhaps we should also call it here in commit-graph.c?
>
>  static int commit_graph_compatible(struct repository *r)
>  {
>         if (!r->gitdir)
>                 return 0;
>
>         if (read_replace_refs) {
>                 prepare_replace_object(r);
>                 if (hashmap_get_size(&r->objects->replace_map->map))
>                         return 0;
>         }
>
>         prepare_commit_graft(r);
>         if (r->parsed_objects && r->parsed_objects->grafts_nr)
>                 return 0;
> +       reset_repository_shallow(r);
>         if (is_repository_shallow(r))
>                 return 0;
>
>         return 1;
>  }
>
> Or am I misunderstanding the state that reset_repository_shallow()
> puts us in? My expectation is that is_repository_shallow() will
> act as if the shallow variables had never been set and will look
> for shallow data from disk.

You're not misunderstanding it at all. I don't think that this location
is strictly necessary, since all of the other locations that change
'.git/shallow' are already invaliding the shallow-ness checks, so by the
time we get to this point 'is_repository_shallow' has to take the slow
path and redetermine whether or not we are actually shallow.

So, I don't think this location is strictly necessary, and it's
potentially slowing us down a little bit, but it is hardening us against
other parts of the code that may not call 'reset_repository_shallow'
when they should have.

> Thanks,
> -Stolee

Thanks for the review.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux