Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks

Patrick Steinhardt <ps@xxxxxx> · Mon, 11 Nov 2024 08:07:07 +0100

On Fri, Nov 08, 2024 at 05:31:12PM +0000, Calvin Wan wrote:
> Certain maintenance tasks and subtasks within gc are unsafe to run in
> parallel with other commands because they lock up files such as
> HEAD.

I don't think it is fair to classify this as "unsafe". Nothing is unsafe
here: we take locks to guard us against concurrent modifications.
What you're having problems with is the fact that this safety mechanism
works as expected and keeps other processes from modifying locked the
data.

> Therefore, tasks are marked whether they are async safe or
> not. Async unsafe tasks are run first in the same process before running
> async safe tasks in parallel.
> 
> Since the gc task is partially safe, there are two new tasks -- an async
> safe gc task and an async unsafe gc task. In order to properly invoke
> this in gc, `--run-async-safe` and `--run-async-unsafe` have been added
> as options to gc. Maintenance will only run these two new tasks if it
> was set to detach, otherwise the original gc task runs.
> 
> Additionally, if a user passes in tasks thru `--task`, we do not attempt
> to run separate async/sync tasks since the user sets the order of tasks.
> 
> WIP: automatically run gc unsafe tasks when gc is invoked but not from
>      maintenance
> WIP: edit test in t7900-maintainance.sh to match new functionality
> WIP: add additional documentation for new options and functionality
> 
> Signed-off-by: Calvin Wan <calvinwan@xxxxxxxxxx>
> ---
>  builtin/gc.c           | 173 ++++++++++++++++++++++++++++++++++++-----
>  t/t7900-maintenance.sh |  24 +++---
>  2 files changed, 168 insertions(+), 29 deletions(-)
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index d52735354c..375d304c42 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c

It might make sense to split out the git-gc(1) changes into a
preparatory commit with its own set of tests.

> @@ -815,7 +824,12 @@ struct repository *repo UNUSED)
>  		atexit(process_log_file_at_exit);
>  	}
>  
> -	gc_before_repack(&opts, &cfg);
> +	if (run_async_unsafe) {
> +		gc_before_repack(&opts, &cfg);
> +		goto out;
> +	} else if (!run_async_safe)
> +		gc_before_repack(&opts, &cfg);
> +	
>  

Style: there should be curly braces around the `else if` here.

>  	if (!repository_format_precious_objects) {
>  		struct child_process repack_cmd = CHILD_PROCESS_INIT;
> @@ -1052,6 +1066,46 @@ static int maintenance_task_prefetch(struct maintenance_run_opts *opts,
>  	return 0;
>  }
>  
> +static int maintenance_task_unsafe_gc(struct maintenance_run_opts *opts,
> +				      struct gc_config *cfg UNUSED)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	child.git_cmd = child.close_object_store = 1;
> +	strvec_push(&child.args, "gc");
> +
> +	if (opts->auto_flag)
> +		strvec_push(&child.args, "--auto");
> +	if (opts->quiet)
> +		strvec_push(&child.args, "--quiet");
> +	else
> +		strvec_push(&child.args, "--no-quiet");
> +	strvec_push(&child.args, "--no-detach");
> +	strvec_push(&child.args, "--run-async-unsafe");
> +
> +	return run_command(&child);
> +}
> +
> +static int maintenance_task_safe_gc(struct maintenance_run_opts *opts,
> +				    struct gc_config *cfg UNUSED)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	child.git_cmd = child.close_object_store = 1;
> +	strvec_push(&child.args, "gc");
> +
> +	if (opts->auto_flag)
> +		strvec_push(&child.args, "--auto");
> +	if (opts->quiet)
> +		strvec_push(&child.args, "--quiet");
> +	else
> +		strvec_push(&child.args, "--no-quiet");
> +	strvec_push(&child.args, "--no-detach");
> +	strvec_push(&child.args, "--run-async-safe");
> +
> +	return run_command(&child);
> +}

These two functions and `maintenance_task_gc()` all look exactly the
same. We should deduplicate them.

>  static int maintenance_task_gc(struct maintenance_run_opts *opts,
>  			       struct gc_config *cfg UNUSED)
>  {
> @@ -1350,6 +1404,7 @@ struct maintenance_task {
>  	const char *name;
>  	maintenance_task_fn *fn;
>  	maintenance_auto_fn *auto_condition;
> +	unsigned daemonize_safe;

We can use the enum here to give readers a better hint what this
variable is about.

>  	unsigned enabled:1;
>  
>  	enum schedule_priority schedule;
> @@ -1362,6 +1417,8 @@ enum maintenance_task_label {
>  	TASK_PREFETCH,
>  	TASK_LOOSE_OBJECTS,
>  	TASK_INCREMENTAL_REPACK,
> +	TASK_UNSAFE_GC,
> +	TASK_SAFE_GC,
>  	TASK_GC,
>  	TASK_COMMIT_GRAPH,
>  	TASK_PACK_REFS,
> @@ -1370,36 +1427,62 @@ enum maintenance_task_label {
>  	TASK__COUNT
>  };
>  
> +enum maintenance_task_daemonize_safe {
> +	UNSAFE,
> +	SAFE,
> +};

These names can conflict quite fast. Do we maybe want to rename them to
e.g. `MAINTENANCE_TASK_DAEMONIZE_(SAFE|UNSAFE)`?

>  static struct maintenance_task tasks[] = {
>  	[TASK_PREFETCH] = {
>  		"prefetch",
>  		maintenance_task_prefetch,
> +		NULL,
> +		SAFE,
>  	},

It might make sense to prepare these to take designated field
initializers in a preparatory commit.

>  	[TASK_LOOSE_OBJECTS] = {
>  		"loose-objects",
>  		maintenance_task_loose_objects,
>  		loose_object_auto_condition,
> +		SAFE,
>  	},
>  	[TASK_INCREMENTAL_REPACK] = {
>  		"incremental-repack",
>  		maintenance_task_incremental_repack,
>  		incremental_repack_auto_condition,
> +		SAFE,
> +	},
> +	[TASK_UNSAFE_GC] = {
> +		"unsafe-gc",
> +		maintenance_task_unsafe_gc,
> +		need_to_gc,
> +		UNSAFE,
> +		0,
> +	},
> +	[TASK_SAFE_GC] = {
> +		"safe-gc",
> +		maintenance_task_safe_gc,
> +		need_to_gc,
> +		SAFE,
> +		0,
>  	},

Hm. I wonder whether we really want to expose additional tasks to
address the issue, which feels like we're leaking implementation details
to our users. Would it maybe be preferable to instead introduce a new
optional callback function for every task that handles the pre-detach
logic?

I wonder whether we also have to adapt the "pack-refs" task to be
synchronous instead of asynchronous?

> @@ -1436,6 +1527,57 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts,
>  	}
>  	free(lock_path);
>  
> +	for (i = 0; !found_selected && i < TASK__COUNT; i++)
> +		found_selected = tasks[i].selected_order >= 0;
> +
> +	if (found_selected)
> +		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
> +	else if (opts->detach > 0) {
> +		/* 
> +		 * Part of gc is unsafe to run in the background, so
> +		 * separate out gc into unsafe and safe tasks.
> +		 * 
> +		 * Run unsafe tasks, including unsafe maintenance tasks,
> +		 * before daemonizing and running safe tasks.
> +		 */
> +
> +		if (tasks[TASK_GC].enabled) {
> +			tasks[TASK_GC].enabled = 0;
> +			tasks[TASK_UNSAFE_GC].enabled = 1;
> +			tasks[TASK_UNSAFE_GC].schedule = tasks[TASK_GC].schedule;
> +			tasks[TASK_SAFE_GC].enabled = 1;
> +			tasks[TASK_SAFE_GC].schedule = tasks[TASK_GC].schedule;
> +		}

If we did the above, then we could also get rid of the complexity here.

Thanks!

Patrick