Re: Why repository grows after "git gc"? / Purpose of *.keep files?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 12 May 2008, Shawn O. Pearce wrote:

> Mike Hommey <mh@xxxxxxxxxxxx> wrote:
> > On Mon, May 12, 2008 at 11:03:04PM +0200, Mike Hommey wrote:
> > > On Mon, May 12, 2008 at 11:24:14PM +0300, Teemu Likonen wrote:
> > > > But I have experienced the same earlier with some other post-1.5.5
> > > > version so I believe you can reproduce this yourself. After cloning
> > > > Linus's linux-2.6 repo its .git directory weights 209MB. After single
> > > > "git pull" and "git gc" it was 298MB in my test.
> > > 
> > > I noticed that a while ago: when repacking multiple packs when one has a
> > > .keep file, the resulting additional pack contains too many blobs and
> > > trees, contrary to when only packing loose objects:
> > (...)
> > 
> > That is, it seems to also contain all the blobs and subtrees for all the
> > commits the pack contains, even when they already are in the pack having
> > a .keep file.
> 
> I've noticed this too.  Like since day 1 when we added .keep.
> But uh, nobody else complained and I forgot about it.

Well, now that I've reproduced Teemu Likonen's test case, I can confirm 
this is actually a problem.  Here I get:

|remote: Counting objects: 523, done.
|remote: Compressing objects: 100% (57/57), done.
|remote: Total 362 (delta 305), reused 362 (delta 305)
|Receiving objects: 100% (362/362), 65.37 KiB, done.
|Resolving deltas: 100% (305/305), completed with 105 local objects.
|From ../test1
|   492c2e4..9404ef0  master     -> master

The received pack is 449135 bytes large.  This is much larger than the 
actually received data which is 65.37 KiB, but we're completing a thin 
pack with 105 undeltified objects accounting for the size increase which 
is expected.  So far so good.

Now, in theory, running 'git gc' should only repack those 362 + 105 
objects, since the remaining ones are all found in the .keep flagged 
pack.  But that's not what's happening at all:

|Counting objects: 26559, done.
|Compressing objects: 100% (24708/24708), done.
|Writing objects: 100% (26559/26559), done.
|Total 26559 (delta 3054), reused 14011 (delta 1613)

So... there is something definitively wrong here.  The expectation was 
to get a pack in the same size range as the one received during the 
pack, or somewhat smaller due to a better delta compression of the added 
objects.  But instead we get a pack containing  26559 objects!!!  And in 
that lot, only 3054 (11%) are deltas.  That makes for a pack that 
started from 449135 bytes and grew to 72395940 bytes.

> My theory (totally unproven) is that the new pack has objects we
> copied from the .keep pack, because those objects were the best
> delta-bases for the loose objects we have deltafied and want to
> store in the new pack.  Except they aren't yet packed in the new
> pack, so we pack them too.  Tada, duplicates.  :-\

Well, not exactly.

Let's see what happens here even before any packing is attempted

|$ git rev-list --objects 492c2e4..9404ef0
|362
|
|$ git rev-list --objects --all \
|   --unpacked=pack-6a3438b2702be06697023d80b77e67a73a0b0b5c.pack |
|	wc -l
|26559

So this --unpacked= argument (which undocumented semantics I still have 
issues with) is certainly not doing what is expected.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux