Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 12, 2012 at 02:25:47PM -0400, Nicolas Pitre wrote:

> My feeling is that we should make a step backward and consider if this 
> is actually the right problem to solve.  I don't remember why I might 
> have been opposed to a reflog for deleted branches as you say I did, but 
> that is certainly a feature that could prove to be useful.

I think your argument was along the lines of "this information can be
reconstructed from the HEAD reflog, anyway, so it is not worth the
effort". My counter to that is that the HEAD reflog is useless on bare
repositories (I have considered adding each pushed ref to a HEAD-like
reflog with everything in it, but doing it without lock contention
between pushes to different refs is tricky).

But keep in mind that a deletion reflog does not make this problem go
away. It might make it less likely, but there are still cases where the
gc can create a much larger object db.

> Then having a repository that can be used as an alternate for other 
> repositories without knowing about it is also a problem that needs 
> fixing and not only because of this object expiry issue.  This is not 
> easy to fix though.

Yeah, I think that is an open problem, because you do not necessarily
have any write access at all to the alternates repository (however, that
does not need to stop us from making it safer in the case that you _do_
have write access to the alternates repository).

> Then, the creation of unreferenced objects from successive 'git add' 
> shouldn't create that many objects in the first place.  They currently 
> never get the chance to be packed to start with.

I don't think these objects are necessarily from successive "git add"s.
That is one source, but they may also come from reflogs expiring. I
guess in that case that they would typically be in an older pack,
though.

> So the problem is really about 'git gc' creating more data on disk which 
> is counter productive for a garbage collecting task.  Maybe the trick is 
> simply not to delete any of the old pack which content was repacked into 
> a single new pack and let them age before deleting them, rather than 
> exploding a bunch of loose objects.  But then we're back to the same 
> issue I wanted to get away from i.e. identifying real cruft packs and 
> making them safely deletable.

That is satisfyingly simple, but the storage requirement is quite bad.
The unreachable objects are very much in the minority, and an occasional
duplication there is not a big deal; duplicating all of the reachable
objects would double the object directory's size.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]