Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 12 Jun 2012, Ted Ts'o wrote:

> On Tue, Jun 12, 2012 at 02:25:47PM -0400, Nicolas Pitre wrote:
> > > Earlier in the thread, I outlined another scheme by which you could
> > > repack and avoid the duplicates. It does not require changes to git's
> > > object lookup process, because it would involve manually feeding the
> > > list of cruft objects to pack-objects (which will pack what you ask it,
> > > regardless of whether the objects are in other packs).
> > 
> > That might be hard to achieve good delta compression though, as the main 
> > key to sort those objects is their path name, and with unreferenced 
> > objects you might not necessarily have that information.  The ability to 
> > reuse pack data might mitigate this though.
> 
> Compared to loose objects, even not-so-great delta compression is
> manna from heaven.  Remember what originally got me to start this
> flag.  There was 4.5 megabytes worth of loose objects, that when I
> created the object id list and fed the result to git pack-object, the
> resulting pack was 244k.
> 
> OK, maybe the delta compression wasn't optimal.  Compared to the 4.5
> megabytes of loose objects --- I'll happily settle for that!  :-)

Sure.  However I would be even happier if we could delete those unneeded 
objects outright.  The official reason why they're there for two weeks 
should be to avoid some race conditions, and in this case two weeks is 
way over the top as in "normal" conditions the actual window for a race 
is in the order of a few seconds..  Any other use case should be 
considered abusive.

> > So the problem is really about 'git gc' creating more data on disk which 
> > is counter productive for a garbage collecting task.  Maybe the trick is 
> > simply not to delete any of the old pack which content was repacked into 
> > a single new pack and let them age before deleting them, rather than 
> > exploding a bunch of loose objects.  But then we're back to the same 
> > issue I wanted to get away from i.e. identifying real cruft packs and 
> > making them safely deletable.
> 
> But the old packs are huge; in my case, a full set of packs was around
> 16 megabytes.  Right now, git gc *increased* my disk usage by 4.5
> megabytes.  If we don't delete the old backs, then git gc would
> increase disk usage by 16 megabytes --- which is far, far worse.
> 
> Writing a 244k cruft pack is a soooooo much preferable.

But as you might have noticed, there are a bunch of semantic problems 
with that as well.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]