On 5/10/2018 3:17 PM, Ævar Arnfjörð Bjarmason wrote:
On Thu, May 10 2018, Derrick Stolee wrote:
The behavior in this patch series does the following:
1. Near the end of 'git gc', run 'git commit-graph write'. The location
of this code assumes that a 'git gc --auto' has not terminated early
due to not meeting the auto threshold.
2. At the end of 'git fetch', run 'git commit-graph write'. This means
that every reachable commit will be in the commit-graph after a
a successful fetch, which seems a reasonable frequency. Then, the
only times we would be missing a reachable commit is after creating
one locally. There is a problem with the current patch, though: every
'git fetch' call runs 'git commit-graph write', even if there were no
ref updates or objects downloaded. Is there a simple way to detect if
the fetch was non-trivial?
One obvious problem with this approach: if we compute this during 'gc'
AND 'fetch', there will be times where a 'fetch' calls 'gc' and triggers
two commit-graph writes. If I were to abandon one of these patches, it
would be the 'fetch' integration. A 'git gc' really wants to delete all
references to unreachable commits, and without updating the commit-graph
we may still have commit data in the commit-graph file that is not in
the object database. In fact, deleting commits from the object database
but not from the commit-graph will cause 'git commit-graph verify' to
fail!
I welcome discussion on these ideas, as we are venturing out of the
"pure data structure" world and into the "user experience" world. I am
less confident in my skills in this world, but the feature is worthless
if it does not improve the user experience.
I really like #1 here, but I wonder why #2 is necessary.
I.e. is it critical for the performance of the commit graph feature that
it be kept really up-to-date, moreso than other things that rely on gc
--auto (e.g. the optional bitmap index)?
It is not critical. The feature has been designed to have recent commits
not in the file. For simplicity, it is probably best to limit ourselves
to writing after a non-trivial 'gc'.
Even if that's the case, I think something that does this via gc --auto
is a much better option. I.e. now we have gc.auto & gc.autoPackLimit, if
the answer to my question above is "yes" this could also be accomplished
by introducing a new graph-specific gc.* setting, and --auto would just
update the graph more often, but leave the rest.
This is an excellent idea for a follow-up series, if we find we want the
commit-graph written more frequently. For now, I'm satisfied with one
place where it is automatically computed.
I'll drop the fetch integration in my v2 series.
Thanks,
-Stolee