Re: git repack command on larger pack file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sivakumar Selvam <gerritcode@xxxxxxxxx> writes:

>    I ran git repack on a single larger repository abc.git where the pack
> file size 34 GB. Generally it used to take 20-25 minutes in my server to
> complete the repacking. During repacking I noticed, disk usage was more, So
> I thought of splitting the pack file into 4 GB chunks. I used the following
> command to do repacking.
>    git repack -A -b -d -q --depth=50 --window=10 abc.git
>
>    After adding --max-pack-size=4g to the above command again I ran to split
> pack files..
>    git repack -A -b -d -q --depth=50 --window=10 --max-pack-size=4g abc.git
>  
>    When I finished running, I found 12 pack files with each 4 GB and the
> size is 48 GB. Now my disk usage has increased by 14 GB. Again, I ran to
> check the performance, but the size (48 GB) and time to repacking takes
> another 35 minutes more. Why this issue?

Hmmm, what is "this issue"?  I do not see anything surprising.

If you have N objects and run repack with window=10, you would
(roughly speaking, without taking various optimization we have and
bootstrap conditions into account) check each of these N objects
against 10 other objects to find good delta base, no matter how big
your max pack-size is set.  And that takes the bulk of time in the
repack process.  Also it has to write more data to disk (see below),
it has to find a good place to split, it has to adjust bookkeeping
data at the pack boundary, in general it has to do more, not less,
to produce split packs.  It would be surprising if it took less
time.

Each pack by definition has to be self-sufficient; all delta in the
pack must have its base object in the same pack.  Now, imagine that
an object (call it X) would have been expressed as a delta derived
from another object (call it Y) if you were producing a single pack,
and imagine that the pack has grown to be 4 GB big just before you
write object X out.  The current pack (which contains the base
object Y already) needs to be closed and then a new pack is opened.
Imagine how you would write X now into that new pack.  You have to
discard the deltified representation of X (which by definition is
much smaller, because it is an instruction to reconstitute X given
an object Y whose contents is very similar to X) and write the base
representation of X to the pack, because X can no longer be
expressed as a delta derived from Y.  That is why you would need to
write more.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]