Re: [PATCH v2 05/18] maintenance: add commit-graph task

Taylor Blau <me@xxxxxxxxxxxx> · Fri, 24 Jul 2020 21:52:40 -0400

On Fri, Jul 24, 2020 at 12:47:00PM -0700, Junio C Hamano wrote:
> Derrick Stolee <stolee@xxxxxxxxx> writes:
>
> > But you are discussing here how the _behavior_ can change when
> > --auto is specified. And specifically, "git gc --auto" really
> > meant "This is running after a foreground command, so only do
> > work if necessary and do it quickly to minimize blocking time."
> >
> > I'd be happy to replace "--auto" with "--quick" in the
> > maintenance builtin.
> >
> > This opens up some extra design space for how the individual
> > tasks perform depending on "--quick" being specified or not.
> > My intention was to create tasks that are already in "quick"
> > mode:
> >
> > * loose-objects have a maximum batch size.
> > * incremental-repack is capped in size.
> > * commit-graph uses the --split option.
> >
> > But this "quick" distinction might be important for some of
> > the tasks we intend to extract from the gc builtin.
>
> Yup.  To be honest, I came to this topic from a completely different
> direction.  The field name "auto" alone (and no other field name)
> had to have an extra cruft (i.e. "_flag") attached to it, which is
> understandable but ugly.  Then I started thinking if 'auto(matic)'
> is really the right word to describe what we want out of the option,
> and came to the realization that there may be better words.

I wonder what the quick and slow paths are here. For the commit-graph
code, what you wrote here seems to match what I'd expect with passing
'--auto' in the sense of running 'git gc'. That is, I'm leaving it up to
the commit-graph machinery's idea of the normal '--split' rules to
figure out when to roll up layers of a commit-graph, as opposed to
creating a new layer and extending the chain.

So, I think that makes sense if the caller gave '--auto'. But, I'm not
sure that it makes sense if they didn't, in which case I'd imagine
something quicker to happen. There, I'd expect something more like:

  1. Run 'git commit-graph write --reachable --split=no-merge'.
  2. Run 'git commit-graph verify'.
  3. If 'git commit-graph verify' failed, drop the existing commit graph
     and rebuild it with 'git commit-graph --reachable --split=replace'.
  4. Otherwise, do nothing.

I'm biased, of course, but I think that that matches roughly what I'd
expect to happen in the fast/slow path. Granted, the steps to rebuild
the commit graph are going to be slow no matter what (depending on the
size of the repository), and so in that case maybe the commit-graph
should just be dropped. I'm not really sure what to do about that...

> > Since the tasks are frequently running subcommands, returning
> > 0 for success and non-zero for error matches the error codes
> > returned by those subcommands.
>
> As long as these will _never_ be called from other helper functions
> but from the cmd_foo() top-level and their return values are only
> used directly as the top-level's return value, I do not mind too
> much.
>
> But whenever I am writing such a code, I find myself not brave
> enough to make such a bold promise (I saw other people call the
> helpers I wrote in unintended ways and had to adjust the semantics
> of them to accomodate the new callers too many times), so I'd rather
> see the caller do "return !!helper_fn()" to allow helper_fn() to be
> written more naturally (e.g. letting them return error(...)).
>
> Thanks.

Thanks,
Taylor