Re: [PATCH v2] maintenance: add prune-remote-refs task

Patrick Steinhardt <ps@xxxxxx> · Mon, 30 Dec 2024 08:15:55 +0100

On Sat, Dec 28, 2024 at 10:07:41AM +0000, Shubham Kanodia via GitGitGadget wrote:
> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 6e6651309d3..8b3e496c8ef 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -158,6 +158,26 @@ pack-refs::
>  	need to iterate across many references. See linkgit:git-pack-refs[1]
>  	for more information.
>  
> +prune-remote-refs::
> +	The `prune-remote-refs` task runs `git remote prune` on each remote
> +	repository registered in the local repository. This task helps clean
> +	up deleted remote branches, improving the performance of operations
> +	that iterate through the refs. See linkgit:git-remote[1] for more
> +	information. This task is disabled by default.
> ++
> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> +for users of git-maintenance. For most users, configuring `fetch.prune=true`

Do we want to make this linkgit:git-maintenance[1] even though this is
self-referential?

> +is a acceptable solution, as it will automatically clean up stale remote-tracking
> +branches during normal fetch operations. However, this task can be useful in
> +specific scenarios:
> ++
> +--
> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> +  where `fetch.prune` would only affect refs that are explicitly fetched.
> +* When third-party tools might perform unexpected full fetches, and you want
> +  periodic cleanup independently of fetch operations.
> +--

Nicely explained. I wish we had more such documentation that is taking
the user by their hand and explains why they may or may not want to have
a specific thing.

> diff --git a/builtin/gc.c b/builtin/gc.c
> index 4ae5196aedf..329c764f300 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -20,6 +20,7 @@
>  #include "lockfile.h"
>  #include "parse-options.h"
>  #include "run-command.h"
> +#include "remote.h"
>  #include "sigchain.h"
>  #include "strvec.h"
>  #include "commit.h"
> @@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
>  	return 0;
>  }
>  
> +static int prune_remote(struct remote *remote, void *cb_data UNUSED)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	if (!remote->url.nr)
> +		return 0;
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
> +
> +	return !!run_command(&child);
> +}
> +
> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
> +					 struct gc_config *cfg UNUSED)
> +{
> +	if (for_each_remote(prune_remote, opts)) {
> +		error(_("failed to prune remotes"));
> +		return 1;

I wonder whether we should adapt the loop to be eager. Erroring out on
the first failed remote would potentially mean that none of the other
remotes may get pruned. So if you had a now-unreachable remote as first
remote then none of your remotes would be pruned.

If so, we may want to collect the names of failed remotes and print
them, as well.

Patrick