Re: Packfile can't be mapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> git-repack can't handle my 1.75GB pack file. I am running x86 with 3GB
> address space.
> 
> -rw-rw-r-- 1 jonsmirl jonsmirl    47221712 Aug 27 20:29 testme.idx
> -rw-rw-r-- 1 jonsmirl jonsmirl  1754317619 Aug 27 20:29 testme.pack
> 
> [jonsmirl@jonsmirl t1]$ git-repack -a -f --window=50 --depth=5000
> Generating pack...
> Done counting 1963325 objects.
> fatal: packfile .git/objects/pack/testme.pack cannot be mapped.
> [jonsmirl@jonsmirl t1]$
> 
> It is built from Mozilla CVS but it is an intermediate stage of our
> work. The fast-import tool isn't diffing directory tree which makes
> the pack much bigger than it needs to be. Shawn is working on the
> packing code.

I'm going to try to get tree deltas written to the pack sometime this
week. That should compact this intermediate pack down to something
that git-pack-objects would be able to successfully mmap into a
32 bit address space.  A complete repack with no delta reuse will
hopefully generate a pack closer to 400 MB in size.  But I know
Jon would like to get that pack even smaller.  :)

I should point out that the input stream to fast-import was 20 GB
(completely decompressed revisions from RCS) plus all commit data.
The original CVS ,v files are around 3 GB.  An archive .tar.gz'ing
the ,v files is around 550 MB.  Going to only 1.7 GB without tree
or commit deltas is certainly pretty good.  :)

> ---------------------------------------------------
> Alloc'd objects:    1968000 (   1892000 overflow  )
> Total objects:      1967527 (     41856 duplicates)
>       blobs  :       633842 (         0 duplicates)
>       trees  :      1131208 (     41856 duplicates)
>       commits:       200921 (         0 duplicates)
>       tags   :         1556 (         0 duplicates)
> Total branches:        1600 (      7985 loads     )
>       marks:        1048576 (    200921 unique    )
>       atoms:          56803
> Memory total:         66908 KiB
>        pools:          5408 KiB
>      objects:         61500 KiB
> Pack remaps:           9501
> ---------------------------------------------------
> Pack size:          1713200 KiB
> Index size:           46114 KiB

All of that says that aside from the 1.7 GB output file fast-import
ran extremely well.  About 1.9 million objects were written into
the output pack file, with 41k duplicate trees (duplicate blobs
were removed by cvs2svn prior to fast-import so they don't appear).
200k commits were created across 1600 branches.  And we did it in
only 67 MB of memory.

We also had ~8000 LRU cache misses related to our branch data;
this just means that cvs2svn likes to frequently jump around
between branches rather than import an entire branch at a time.
Boosting the size of the LRU cache (at the expense of needing more
memory) should reduce those cache misses as well as 'Pack remaps'.

I'd also like to clean up that pack remapping code and move it
into sha1_file.c.  Its an implementation of partial pack mapping
and it is apparently working quite well for us in fast-import.
It may help GIT deal with very large packs (e.g. 1.7 GB) on smaller
address space systems (e.g. 32 bit).


We're not confident that this import is completely valid yet.
We have a few translation issues we're still working on.  But now
that we have a complete pack going from start to finish we can start
to focus on those issues.  Especially since this entire process
(,v to .pack) is less than half a day to run.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]