Re: [PATCH] git exproll: steps to tackle gc aggression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ramkumar Ramachandra <artagnon@xxxxxxxxx> writes:

> Junio C Hamano wrote:
>> Imagine we have a cheap way to enumerate the young objects without
>> the usual history traversal.
>
> Before we discuss the advantages, can you outline how we can possibly
> get this data without actually walking downwards from the roots
> (refs)? One way to do it is to pull data out of a log of ref updates
> (aka. reflog), but we both know how unreliable that can be.

My understanding of the topic is to come up with a way that is much
cheaper than the current "gc --auto" that involves recent history
walk to consolidate both loose objects and small young packs into
one, so that we can use that logic for "gc --auto".

The key phrase is "without the usual history traversal".  We are
talking about young objects, and they are likely to be reachable
from something (like reflog entries, if not refs).  We may include
unreachable cruft in the result in the "let's be quick and collect
them into a single young pack", and you will need to keep them while
reflog entries are alive, and you will need periodic sweeps with the
usual history walking to remove older crufts that recently have
become unreachable due to reflog expiry from packs anyway, so it is
not a problem for the pack that consolidates young objects into a
single pack to contain some unreachable crufts.

If you start from that assumption [*1*], the way to enumerate the
young objects without the usual history traversal should be fairly
obvious.

By definition, loose objects are all young because they were created
since the last "gc --auto".  Also pack .idx files know their own
creation timestamp to let you decide how old they are, you can see
how many objects there are in the corresponding .pack and how big it
is.

By doing an equivalent of "find .git/objects/[0-9a-f][0-9a-f]/", you
can enumerate the loose objects, and an equivalent of "show-ref"
will enumerate the objects in the pack that the .idx file you
determined to be small and young.

Note that *1* is an assumption. I do not know offhand if such a
"consolidate young objects quickly into one to keep the number of
packs small" strategy is an overall win.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]