Re: Shallow clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 12 Nov 2006 00:16:40 -0800 Junio C Hamano wrote:

> Alexandre Julliard <julliard@xxxxxxxxxx> writes:
>
> > There's also a problem with the packing, a clone --depth 1 currently
> > results in a pack that's about 3 times as large as it should be.
>
> That's interesting.
>
>   : gitster; git clone -n --depth 1 git://127.0.0.1/git.git victim-001
[...]
>   -r--r--r-- 1 junio src 9.5M 2006-11-11 23:52 pack-f5f88d83....pack
>
> Repacking immediately after cloning brings it down to what is
> expected.
>
>   : gitster; git repack -a -d -f
[...]
>   -rw-rw-r-- 1 junio src 2.6M 2006-11-11 23:53 pack-f5f88d83....pack

This is due to optimization in builtin-pack-objects.c:try_delta():

	/*
	 * We do not bother to try a delta that we discarded
	 * on an earlier try, but only when reusing delta data.
	 */
	if (!no_reuse_delta && trg_entry->in_pack &&
	    trg_entry->in_pack == src_entry->in_pack)
		return 0;

After removing this part the shallow pack after clone is 2.6M, as it
should be.

The problem with this optimization is that it is only valid if we are
repacking either the same set of objects as we did earlier, or its
superset.  But if we are packing a subset of objects, there will be some
objects in that subset which were delta-compressed in the original pack,
but base objects for that deltas are not included in our subset -
therefore we will be unable to reuse existing deltas, and with that
optimization we will never try to use delta compression for these
objects.  (The optimization assumes that if we will try to use delta
compression, we will try mostly the same base objects as we have tried
when we made the existing pack, and therefore will likely get the same
result - which is close to the truth when we are doing "repack -a", but
is badly wrong when we are doing "git-upload-pack" with a large number
of common commits, and therefore are excluding a lot of objects.)

So any partial fetch (shallow or not) from a mostly packed repository
currently results in a suboptimal pack.  In fact, the fresh "repack -a
-d -f" is probably the worst case for subsequent fetch (not initial
clone) from that repository - objects for the most recent commit are
most likely to be stored without delta compression, and even if deltas
are used, they are likely in the wrong direction for someone who has an
older version and wants to update it.


> In any case, after this "shallow" stuff, repeated "fetch --depth
> 99" seems to fetch 0 object and 3400 objects alternately, and
> the shallow file alternates between 900 bytes and 11000 bytes.

I confirm this - different numbers, but the same problem...

Attachment: pgpPw9wO7JKZT.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]