Re: [PATCH v2 05/18] maintenance: add commit-graph task

Derrick Stolee <stolee@xxxxxxxxx> · Thu, 30 Jul 2020 09:59:21 -0400

On 7/24/2020 9:52 PM, Taylor Blau wrote:
> On Fri, Jul 24, 2020 at 12:47:00PM -0700, Junio C Hamano wrote:
>> Derrick Stolee <stolee@xxxxxxxxx> writes:
>>
>>> But you are discussing here how the _behavior_ can change when
>>> --auto is specified. And specifically, "git gc --auto" really
>>> meant "This is running after a foreground command, so only do
>>> work if necessary and do it quickly to minimize blocking time."
>>>
>>> I'd be happy to replace "--auto" with "--quick" in the
>>> maintenance builtin.
>>>
>>> This opens up some extra design space for how the individual
>>> tasks perform depending on "--quick" being specified or not.
>>> My intention was to create tasks that are already in "quick"
>>> mode:
>>>
>>> * loose-objects have a maximum batch size.
>>> * incremental-repack is capped in size.
>>> * commit-graph uses the --split option.
>>>
>>> But this "quick" distinction might be important for some of
>>> the tasks we intend to extract from the gc builtin.
>>
>> Yup.  To be honest, I came to this topic from a completely different
>> direction.  The field name "auto" alone (and no other field name)
>> had to have an extra cruft (i.e. "_flag") attached to it, which is
>> understandable but ugly.  Then I started thinking if 'auto(matic)'
>> is really the right word to describe what we want out of the option,
>> and came to the realization that there may be better words.
> 
> I wonder what the quick and slow paths are here. For the commit-graph
> code, what you wrote here seems to match what I'd expect with passing
> '--auto' in the sense of running 'git gc'. That is, I'm leaving it up to
> the commit-graph machinery's idea of the normal '--split' rules to
> figure out when to roll up layers of a commit-graph, as opposed to
> creating a new layer and extending the chain.

I had intended all of my new tasks to be the "quick" version of their
operations. The "slow" version would abandon hope of doing a small
amount of work to create the best possible world for the repository.
This would include:

 * The commit-graph would collapse all layers into one file.
 * The multi-pack-index repack would rewrite all object data into one
   pack-file.
 * The loose-objects task would not stop at a maximum number of loose
   objects (and would probably want to repack everything, anyway).

I'm open to making this possibility more explicit by renaming "--auto"
and just performing a translation to 'git gc --auto'. So, what should
the name be? Here are a few options to consider:

 --quick
 --fast
 --limited
 --incremental
 -O[0|1|2...] (think GCC optimization flags, exposing granularity)
 --[non-]aggressive

Regardless, this makes me rethink that the --[no-]maintenance option
from PATCH 03/18 is better than --[no-]auto-maintenance, since we are
really saying "run _some_ maintenance or _no_ maintenance" and the "how"
of the maintenance is left intentionally vague. I've already made the
change locally to add "auto-" so I'll wait for confirmation before
reverting that change.

> So, I think that makes sense if the caller gave '--auto'. But, I'm not
> sure that it makes sense if they didn't, in which case I'd imagine
> something quicker to happen. There, I'd expect something more like:
> 
>   1. Run 'git commit-graph write --reachable --split=no-merge'.
>   2. Run 'git commit-graph verify'.
>   3. If 'git commit-graph verify' failed, drop the existing commit graph
>      and rebuild it with 'git commit-graph --reachable --split=replace'.
>   4. Otherwise, do nothing.
> 
> I'm biased, of course, but I think that that matches roughly what I'd
> expect to happen in the fast/slow path. Granted, the steps to rebuild
> the commit graph are going to be slow no matter what (depending on the
> size of the repository), and so in that case maybe the commit-graph
> should just be dropped. I'm not really sure what to do about that...

I think this approach is the best we can do given the current behavior
inside the commit-graph builtin. Perhaps in the future we could change
the commit-graph builtin to include a "--verify" option so it could do
the "git commit-graph verify --shallow" on the new layer before committing
the new commit-graph-chain file and expiring old layers. That way, we
would not need to delete and rewrite the whole thing when there is a
problem writing the top layer.

>>> Since the tasks are frequently running subcommands, returning
>>> 0 for success and non-zero for error matches the error codes
>>> returned by those subcommands.
>>
>> As long as these will _never_ be called from other helper functions
>> but from the cmd_foo() top-level and their return values are only
>> used directly as the top-level's return value, I do not mind too
>> much.
>>
>> But whenever I am writing such a code, I find myself not brave
>> enough to make such a bold promise (I saw other people call the
>> helpers I wrote in unintended ways and had to adjust the semantics
>> of them to accomodate the new callers too many times), so I'd rather
>> see the caller do "return !!helper_fn()" to allow helper_fn() to be
>> written more naturally (e.g. letting them return error(...)).

I will try to be consistent here with the behavior:

 * 0 is success
 * 1 is failure

Which is what I think you are implying by "return !!helper_fn()".

Thanks,
-Stolee