Re: [PATCH v2] maintenance: add prune-remote-refs task

Shubham Kanodia <shubham.kanodia10@xxxxxxxxx> · Fri, 3 Jan 2025 12:20:07 +0530

On Mon, Dec 30, 2024 at 7:35 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Patrick Steinhardt <ps@xxxxxx> writes:
>
> > On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote:
> >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> >> index 6e6651309d3..8b3e496c8ef 100644
> >> --- a/Documentation/git-maintenance.txt
> >> +++ b/Documentation/git-maintenance.txt
> >> @@ -158,6 +158,26 @@ pack-refs::
> >>      need to iterate across many references. See linkgit:git-pack-refs[1]
> >>      for more information.
> >>
> >> +prune-remote-refs::
> >> +    The `prune-remote-refs` task runs `git remote prune` on each remote
> >> +    repository registered in the local repository. This task helps clean
> >> +    up deleted remote branches, improving the performance of operations
> >> +    that iterate through the refs. See linkgit:git-remote[1] for more
> >> +    information. This task is disabled by default.
> >> ++
> >> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> >> +for users of git-maintenance. For most users, configuring `fetch.prune=true`
> >
> > Do we want to make this linkgit:git-maintenance[1] even though this is
> > self-referential?
>
> That certainly is a thought---the rule could be "whenever we refer
> to a Git command, we refer to it in a uniform way".  An alternative
> would be "of git-maintenance" -> "of this command" to weaken it.
>
> This refers to those users who want to use the command for other
> reasons (you use the scheduled tasks driven by 'git maintenance'
> only because you wanted the 'gc' and 'pack-refs' tasks to run, you
> do not necessarily want to run a new kind of task the new version of
> Git started supporting, especially when the task is destructive,
> like this one).  We might want to stress that point, perhaps?  If a
> reader reads this part of the documentation, finds this task useful
> and decides to use 'git maintenance', the note would sound somewhat
> nonsensical to them---"I thought about the ramifications, I decided
> I wanted to use the command, why would it be opt-in?" is a plausible
> confusion.
>
> >> +is a acceptable solution, as it will automatically clean up stale remote-tracking
> >> +branches during normal fetch operations. However, this task can be useful in
> >> +specific scenarios:
> >> ++
> >> +--
> >> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> >> +  where `fetch.prune` would only affect refs that are explicitly fetched.
> >> +* When third-party tools might perform unexpected full fetches, and you want
> >> +  periodic cleanup independently of fetch operations.
> >> +--
> >
> > Nicely explained. I wish we had more such documentation that is taking
> > the user by their hand and explains why they may or may not want to have
> > a specific thing.
>
> Yes, a configuration or an option that are not for everybody and for
> every situation need such a guidance, and this one is done nicely.
>
> >> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
> >> +                                     struct gc_config *cfg UNUSED)
> >> +{
> >> +    if (for_each_remote(prune_remote, opts)) {
> >> +            error(_("failed to prune remotes"));
> >> +            return 1;
> >
> > I wonder whether we should adapt the loop to be eager. Erroring out on
> > the first failed remote would potentially mean that none of the other
> > remotes may get pruned. So if you had a now-unreachable remote as first
> > remote then none of your remotes would be pruned.
>
> I think the structure, hence the behaviour, is shared with an
> existing prefetch task.  I think the current way is OK-ish, but
> given that we are not in a hurry, we may want to correct the
> semantics for both of them before unleashing this new task to the
> world.
>
> For that, we need the callback functions given to for_each_remote
> (i.e., fetch_remote and prune_remote) to always return "success" in
> the sense to tell "I am done with this remote" to allow the loop to
> continue to the next remote, and convey the failure from the
> subcommand via some other means (like flipping a bit in the cbdata).
>
> Thanks.

Curious — I submitted my patches through GGG, but Junio was kind
enough to apply a few other fixes to it.
Is there a place I can now get the whole diff (with the range diff
patched in) so I can pull that into GGG?

Thanks,
Shubham