Re: [RFC Patch] Preventing corrupt objects from entering the repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 10, 2008 at 07:33:34PM -0500, Nicolas Pitre wrote:
> On Sun, 10 Feb 2008, Junio C Hamano wrote:
> 
> > mkoegler@xxxxxxxxxxxxxxxxx (Martin Koegler) writes:
> > 
> > > This patch adds a cache to keep the object data in memory. The delta
> > > resolving code must also search in the cache.
> > 
> > I have to wonder what the memory pressure in real-life usage
> > will be like.

> FWIW, I don't like this idea.
>
> I'm struggling to find ways to improve performances of 
> pack-objects/index-pack with those large repositories that are becoming 
> more common (i.e. GCC, OOO, Mozilla, etc.)  Anything that increase 
> memory usage isn't very welcome IMHO.

Maybe I have missed something, but all repack problems reported on the
git mailing list happen durring the deltifing phase. The problematic
files are mostly bigger blobs. I'm aware of these problems, so my
patch does not keep any blobs in memory.

As we are talking about memory, let's ignore unpack-objects, which is
used for small packs. Lets compare the memory usage of index-pack to
pack-objects:

If it is disabled (no --strict passed), only a (unused) pointer for
each object in the received pack file is additionally allocated.

On i386, struct object_entry is 84 bytes in pack-objects, but only 52
in index-pack. Both programs keep a struct object_entry for each
object during the runtime in memory. So in this case, index-pack uses
less memory than pack-objects

If the --strict option is passed, more memory is used:

* Again, we add one pointer to struct object_entry. object_entry is
  still smaller.(52<84 bytes).

* index-pack allocates a struct blob/tree/commit/tag for each object in the pack.

  pack-objects also allocates only struct object in the best case
  (reading from pack file), otherwise a struct
  blob/tree/commit/tag. This objects are kept during the runtime of
  pack-objects in memory.

  So depending of the parameters of pack-objects, index-pack uses
  additionally up to 24 bytes per object, but struct object_entry is 32
  bytes smaller.

* index-pack allocates a struct blob/tree/commit/tag for each link to a object outside the pack.

  I don't know the code of pack-objects enough to say something to
  this point.

* index-pack keeps the data for each tag/tree/commit in the pack in memory

  In the next version, I don't need to keep the tag/commit data in
  memory. Tree data could be reconstructed from the written pack,
  but I'm not sure, if the additional code (resolving deltas again),
  would justify the additional memory usage.

So my conclusion is, that the memory usage of index-pack with --strict
should not be too worse compared to pack-objects.

Please remember, that --strict is used for pushing data.

mfg Martin Kögler
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux