Re: 'git gc auto' didn't trigger on large reflog

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 24, 2025 at 08:43:23AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > It's a bit funny, but whether or not `git gc --auto` does anything
> > solely depends on the state of the object database.
> 
> I guess after adding "auto", we haven't been careful enough to
> update the triggering condition as we added new kinds of "garbage"
> to collect?  Should we make an exhausitive and authoritative list of
> gc tasks, document them, and make sure "--auto" pays attention?

Maybe. But maybe a better solution would be to build this into
git-maintenance(1) instead, which is a lot more fine-grained. It already
has properly defined subtasks, and each of these subtasks has an
optional callback function that makes it only run as-needed.

So from my perspective we should:

  - Expand git-maintenance(1) to gain a new task for expiring reflogs.

  - Adapt it to not use git-gc(1) anymore, but instead use the specific
    subtasks.

It also allows us to iterate a lot more on the actual tasks run by the
command and make them configurable. It would for example allow us to
eventually enable incremental repacking via multi-pack indices or
geometric repacking.

> Other than objects (packing loose ones, pruning unreferenced loose
> ones or packing them into cruft packs), we seem to check reflog,
> worktree, and rerere database.
> 
> I do not think there is a readily usable API to query how much stale
> data is in reflogs that are more than N seconds old, without which
> "gc --auto" cannot make decisions.  I am reasonably sure rerere API
> does not give you such data, either.  I have no idea about the
> triggering condition of "worktree prune".

No, there isn't, and computing it is also potentially expensive. You
basically have to iterate through each reflog and then also iterate
through all of its reflog entries to figure out whether anything needs
cleaning or not.

But probably we can come up with clever heuristics instead that don't
require us to be this thorough. We could for example just read the
"HEAD" reflog and figure out whether it contains reflog entries that
would be pruned.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux