Re: git auto-repack is broken...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 2, 2011 at 11:45 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Fri, Dec 02, 2011 at 09:35:52AM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@xxxxxxxx> writes:
>>
>> > When the objects become unreferenced, we eject them from the pack into
>> > loose form again. If they don't become referenced in the 2-week window,
>> > they get pruned then. So yes, you drop the age information, but they do
>> > eventually go away.
>>
>> If you update gc/repack -A to put them in a separate pack, then you would
>> never be able to get rid of them, no? You pack, then eject (which gives
>> them a fresher timestamp), then notice that you are within the 2-week window
>> and pack them again,...
>
> But we shouldn't be packing totally unreferenced objects. Barring bugs,
> the life cycle of such an object should be something like:
>
>  1. Object X is created on branch 'foo'.
>
>  2. Branch 'foo' is deleted, but its commits are still in the HEAD
>     reflog, referencing X.
>
>  3. 90 days pass (actually, I think this might be the 30-day
>     expire-unreachable time)
>
>  4. "git gc" runs "git repack -Ad", which will eject X from the pack
>     into a loose form (because it is not becoming part of the new pack
>     we are writing).

Actually, it is right here when the newly loosened unreferenced
objects will be deleted.  Objects ejected from a pack _are_ given the
timestamp of the pack they were ejected from.  So, if the pack is
older than two weeks (90 days in your example), then so will be the
loosened objects, and git prune will delete them when called by git
gc.

>  5. Two weeks pass.
>
>  6. "git gc" runs "git prune --expire=2.weeks.ago", which removes the
>     object.
>
> "gc" runs between (4) and (6) will not re-pack the object, because it
> remains unreferenced.

Correct with the recognition that loose objects get pack mtime, so
step 5 may be less than two weeks.

> I think things might be slowed somewhat by "gc --auto", which will not
> do a "repack -A" until we have too many packs. So steps (3) and (4) are
> really more like "gc runs git-repack without -A" 50 times, and then we
> finally run "git repack -A".

This is correct.  This should have the effect of increasing the age of
unreferenced objects when they are finally loosened and make it more
likely that they are pruned during the same git gc operation that
loosens them.

Linus's scenario of fetching a lot of stuff that never actually makes
it into the reflogs is still a valid problem.  I'm not sure that
people who don't know what they are doing are going to run into this
problem though.  Since he fetches a lot of stuff without ever checking
it out or creating a branch from it, potentially many objects become
unreferenced every time FETCH_HEAD changes.  If he does this many
times in a short period of time, he could reach the gc.autopacklimit
and trigger gc --auto and produce more than gc.auto loose objects that
are younger than gc.pruneExpire.

Decreasing gc.pruneExpire as you suggested should make it much less
likely to run into this problem.  I wonder if it is worth trying to
limit how often gc --auto is run to not be more often than
gc.pruneExpire or something.  If we modified the timestamp that is
assigned to fetched packs, maybe we could use the pack timestamps as
an indicator of how recently git gc has run.

-Brandon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]