Re: dangling commits and blobs: is this normal?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 23 Apr 2009, Geert Bosch wrote:

> 
> On Apr 22, 2009, at 16:05, Jeff King wrote:
> > The other tradeoff, mentioned by Matthieu, is not about speed, but about
> > rollover of files on disk. I think he would be in favor of a less
> > optimal pack setup if it meant rewriting the largest packfile less
> > frequently.
> > 
> > However, it may be reasonable to suggest that he just not manually "gc"
> > then. If he is not generating enough commits to warrant an auto-gc, then
> > he is probably not losing much by having loose objects. And if he is,
> > then auto-gc is already taking care of it.
> 
> For large repositories with lots of large files, git spends too much
> time copying large packs for relatively little gain. This is obvious when
> you include a few dozen large objects in any repository.
> Currently, there is no limit to the number of times this data may
> be copied. In particular, the average amount of I/O needed for
> changes of size X depends linearly on the size of the total repository.
> So, the mere presence of a couple of large objects has an large distributed
> overhead.

You can put a limit on the number of times this data is copied, and even 
set the limit to zero.  Just add a .keep file to your .pack file and 
that data will remain in stone.  Any further repack will consider only 
those newly added objects you may have.

> Wouldn't it be better to have a maximum of N packs, named
> pack_0 .. pack_(N - 1),  in the repository with each pack_i being
> between 2^i and 2^(i+1)-1 bytes large? We could even dispense
> completely with loose objects and instead have each git operation
> create a single new pack.

I suggested that already for large enough objects.  For small objects 
this makes no sense as you may accumulate too many of them and each one 
would need to be opened in order to find if it contains the desired 
object whereas currently you need a simple directory lookup.

> number of packs in the way described is useful and will lead to significant
> speedups, especially during large imports that currently require frequent
> repacking of the entire repository.

Others commented on that issue already.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]