Re: [RFC PATCH 1/1] maintenance: separate parallelism safe and unsafe tasks

Calvin Wan <calvinwan@xxxxxxxxxx> · Mon, 11 Nov 2024 10:06:10 -0800

On Sun, Nov 10, 2024 at 11:07 PM Patrick Steinhardt <ps@xxxxxx> wrote:
>
> On Fri, Nov 08, 2024 at 05:31:12PM +0000, Calvin Wan wrote:
> > Certain maintenance tasks and subtasks within gc are unsafe to run in
> > parallel with other commands because they lock up files such as
> > HEAD.
>
> I don't think it is fair to classify this as "unsafe". Nothing is unsafe
> here: we take locks to guard us against concurrent modifications.
> What you're having problems with is the fact that this safety mechanism
> works as expected and keeps other processes from modifying locked the
> data.
>
> > Therefore, tasks are marked whether they are async safe or
> > not. Async unsafe tasks are run first in the same process before running
> > async safe tasks in parallel.
> >
> > Since the gc task is partially safe, there are two new tasks -- an async
> > safe gc task and an async unsafe gc task. In order to properly invoke
> > this in gc, `--run-async-safe` and `--run-async-unsafe` have been added
> > as options to gc. Maintenance will only run these two new tasks if it
> > was set to detach, otherwise the original gc task runs.
> >
> > Additionally, if a user passes in tasks thru `--task`, we do not attempt
> > to run separate async/sync tasks since the user sets the order of tasks.
> >
> > WIP: automatically run gc unsafe tasks when gc is invoked but not from
> >      maintenance
> > WIP: edit test in t7900-maintainance.sh to match new functionality
> > WIP: add additional documentation for new options and functionality
> >
> > Signed-off-by: Calvin Wan <calvinwan@xxxxxxxxxx>
> > ---
> >  builtin/gc.c           | 173 ++++++++++++++++++++++++++++++++++++-----
> >  t/t7900-maintenance.sh |  24 +++---
> >  2 files changed, 168 insertions(+), 29 deletions(-)
> >
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index d52735354c..375d304c42 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
>
> It might make sense to split out the git-gc(1) changes into a
> preparatory commit with its own set of tests.
>
> > @@ -815,7 +824,12 @@ struct repository *repo UNUSED)
> >               atexit(process_log_file_at_exit);
> >       }
> >
> > -     gc_before_repack(&opts, &cfg);
> > +     if (run_async_unsafe) {
> > +             gc_before_repack(&opts, &cfg);
> > +             goto out;
> > +     } else if (!run_async_safe)
> > +             gc_before_repack(&opts, &cfg);
> > +
> >
>
> Style: there should be curly braces around the `else if` here.
>
> >       if (!repository_format_precious_objects) {
> >               struct child_process repack_cmd = CHILD_PROCESS_INIT;
> > @@ -1052,6 +1066,46 @@ static int maintenance_task_prefetch(struct maintenance_run_opts *opts,
> >       return 0;
> >  }
> >
> > +static int maintenance_task_unsafe_gc(struct maintenance_run_opts *opts,
> > +                                   struct gc_config *cfg UNUSED)
> > +{
> > +     struct child_process child = CHILD_PROCESS_INIT;
> > +
> > +     child.git_cmd = child.close_object_store = 1;
> > +     strvec_push(&child.args, "gc");
> > +
> > +     if (opts->auto_flag)
> > +             strvec_push(&child.args, "--auto");
> > +     if (opts->quiet)
> > +             strvec_push(&child.args, "--quiet");
> > +     else
> > +             strvec_push(&child.args, "--no-quiet");
> > +     strvec_push(&child.args, "--no-detach");
> > +     strvec_push(&child.args, "--run-async-unsafe");
> > +
> > +     return run_command(&child);
> > +}
> > +
> > +static int maintenance_task_safe_gc(struct maintenance_run_opts *opts,
> > +                                 struct gc_config *cfg UNUSED)
> > +{
> > +     struct child_process child = CHILD_PROCESS_INIT;
> > +
> > +     child.git_cmd = child.close_object_store = 1;
> > +     strvec_push(&child.args, "gc");
> > +
> > +     if (opts->auto_flag)
> > +             strvec_push(&child.args, "--auto");
> > +     if (opts->quiet)
> > +             strvec_push(&child.args, "--quiet");
> > +     else
> > +             strvec_push(&child.args, "--no-quiet");
> > +     strvec_push(&child.args, "--no-detach");
> > +     strvec_push(&child.args, "--run-async-safe");
> > +
> > +     return run_command(&child);
> > +}
>
> These two functions and `maintenance_task_gc()` all look exactly the
> same. We should deduplicate them.
>
> >  static int maintenance_task_gc(struct maintenance_run_opts *opts,
> >                              struct gc_config *cfg UNUSED)
> >  {
> > @@ -1350,6 +1404,7 @@ struct maintenance_task {
> >       const char *name;
> >       maintenance_task_fn *fn;
> >       maintenance_auto_fn *auto_condition;
> > +     unsigned daemonize_safe;
>
> We can use the enum here to give readers a better hint what this
> variable is about.
>
> >       unsigned enabled:1;
> >
> >       enum schedule_priority schedule;
> > @@ -1362,6 +1417,8 @@ enum maintenance_task_label {
> >       TASK_PREFETCH,
> >       TASK_LOOSE_OBJECTS,
> >       TASK_INCREMENTAL_REPACK,
> > +     TASK_UNSAFE_GC,
> > +     TASK_SAFE_GC,
> >       TASK_GC,
> >       TASK_COMMIT_GRAPH,
> >       TASK_PACK_REFS,
> > @@ -1370,36 +1427,62 @@ enum maintenance_task_label {
> >       TASK__COUNT
> >  };
> >
> > +enum maintenance_task_daemonize_safe {
> > +     UNSAFE,
> > +     SAFE,
> > +};
>
> These names can conflict quite fast. Do we maybe want to rename them to
> e.g. `MAINTENANCE_TASK_DAEMONIZE_(SAFE|UNSAFE)`?
>
> >  static struct maintenance_task tasks[] = {
> >       [TASK_PREFETCH] = {
> >               "prefetch",
> >               maintenance_task_prefetch,
> > +             NULL,
> > +             SAFE,
> >       },
>
> It might make sense to prepare these to take designated field
> initializers in a preparatory commit.

Thanks for all the stylistic feedback. I agree much of this can be
cleaned up to be simpler, but I sent this as an RFC to gather feedback
on whether this patch directionally made sense. Will clean everything
up in the v1.

>
> >       [TASK_LOOSE_OBJECTS] = {
> >               "loose-objects",
> >               maintenance_task_loose_objects,
> >               loose_object_auto_condition,
> > +             SAFE,
> >       },
> >       [TASK_INCREMENTAL_REPACK] = {
> >               "incremental-repack",
> >               maintenance_task_incremental_repack,
> >               incremental_repack_auto_condition,
> > +             SAFE,
> > +     },
> > +     [TASK_UNSAFE_GC] = {
> > +             "unsafe-gc",
> > +             maintenance_task_unsafe_gc,
> > +             need_to_gc,
> > +             UNSAFE,
> > +             0,
> > +     },
> > +     [TASK_SAFE_GC] = {
> > +             "safe-gc",
> > +             maintenance_task_safe_gc,
> > +             need_to_gc,
> > +             SAFE,
> > +             0,
> >       },
>
> Hm. I wonder whether we really want to expose additional tasks to
> address the issue, which feels like we're leaking implementation details
> to our users. Would it maybe be preferable to instead introduce a new
> optional callback function for every task that handles the pre-detach
> logic?

This does sound like a good idea. However, would there be any issue
with running all pre-detach logic before running post-detach logic?
I'm thinking if pre-detach logic from a different function could
affect post-detach logic from another. If not, I do agree this would
be the best solution going forward.

> I wonder whether we also have to adapt the "pack-refs" task to be
> synchronous instead of asynchronous?