Re: git-fast-import yields huge packfile

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Richard,

On Sat, 16 Mar 2019, Richard Hipp wrote:

> I'm trying to transform a repository from another VCS into a Git
> repository using "git fast-import".  It appears to work, but the
> resulting Git repository is huge relative to the original - 18 times
> larger. Most of the space seems to be taken up by a single large
> packfile.  That packfile is about 967 MB which is about 1/4th the
> total uncompressed size of all 41785 distinct Blobs in the original
> repository.  The source VCS is able to compress this down to 52 MB by
> comparison.

I feel your pain, as I had the same problem back in the day. My use case
was mirroring an upstream Mercurial repository to a Git repository. This
use case went away, so I do not do that anymore (and there are more, less
happy reasons why I would no longer work on that git-remote-hg project,
but that's off topic). As one of the last rem(a)inders, Git for Windows
carries this patch:

https://github.com/git-for-windows/git/commit/b91911ff8d3e2cf279b4708be89de2e3bc8e9e87

Essentially, it *always* runs `git gc --auto` after running `fast-import`.

Which is a lot more high-level advice than the rather low-level `git
repack` hint given elsewhere in this thread.

Now, I wonder whether we should integrate this into `fast-import` proper
(with a knob to turn it off), maybe even offer to run `git gc --auto`
every <N> imported commits?

Ciao,
Johannes

> Maybe I'm doing something wrong with the fast-import stream that is
> defeating Git's attempts at delta compression....
>
> Are there any utility programs available for analyzing packfiles so
> that I try to figure out where the inefficiencies are cropping up, so
> that I can try to address them?
>
> Anybody have any suggestions on what I should be looking for?
>
> If anyone would care to see this oversized packfile and perhaps offer
> suggestions on how I can make it more space-efficient, it can be
> cloned from https://github.com/drhsqlite/fossil-mirror.git - at least
> for now - surely I will delete that repo and regenerate it once I
> figure out this problem.
>
> --
> D. Richard Hipp
> drh@xxxxxxxxxx
>




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux