On Sun, 12 Nov 2006 00:16:40 -0800 Junio C Hamano wrote: > Alexandre Julliard <julliard@xxxxxxxxxx> writes: > > > There's also a problem with the packing, a clone --depth 1 currently > > results in a pack that's about 3 times as large as it should be. > > That's interesting. > > : gitster; git clone -n --depth 1 git://127.0.0.1/git.git victim-001 [...] > -r--r--r-- 1 junio src 9.5M 2006-11-11 23:52 pack-f5f88d83....pack > > Repacking immediately after cloning brings it down to what is > expected. > > : gitster; git repack -a -d -f [...] > -rw-rw-r-- 1 junio src 2.6M 2006-11-11 23:53 pack-f5f88d83....pack This is due to optimization in builtin-pack-objects.c:try_delta(): /* * We do not bother to try a delta that we discarded * on an earlier try, but only when reusing delta data. */ if (!no_reuse_delta && trg_entry->in_pack && trg_entry->in_pack == src_entry->in_pack) return 0; After removing this part the shallow pack after clone is 2.6M, as it should be. The problem with this optimization is that it is only valid if we are repacking either the same set of objects as we did earlier, or its superset. But if we are packing a subset of objects, there will be some objects in that subset which were delta-compressed in the original pack, but base objects for that deltas are not included in our subset - therefore we will be unable to reuse existing deltas, and with that optimization we will never try to use delta compression for these objects. (The optimization assumes that if we will try to use delta compression, we will try mostly the same base objects as we have tried when we made the existing pack, and therefore will likely get the same result - which is close to the truth when we are doing "repack -a", but is badly wrong when we are doing "git-upload-pack" with a large number of common commits, and therefore are excluding a lot of objects.) So any partial fetch (shallow or not) from a mostly packed repository currently results in a suboptimal pack. In fact, the fresh "repack -a -d -f" is probably the worst case for subsequent fetch (not initial clone) from that repository - objects for the most recent commit are most likely to be stored without delta compression, and even if deltas are used, they are likely in the wrong direction for someone who has an older version and wants to update it. > In any case, after this "shallow" stuff, repeated "fetch --depth > 99" seems to fetch 0 object and 3400 objects alternately, and > the shallow file alternates between 900 bytes and 11000 bytes. I confirm this - different numbers, but the same problem...
Attachment:
pgpPw9wO7JKZT.pgp
Description: PGP signature