Re: git clone dies (large git repository)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Troy Telford" <ttelford@xxxxxxxxxxxxxxxx> writes:

> I originally had everything as loose objects.  I then ran 'git-repack
> -d' on occasion, so I had a combination of a large pack file, smaller
> pack  files, and loose objects.  Finally, I tried 'git repack -a -d'
> and  consolidated it all into a single 4GB pack file.  It didn't seem
> to make  much difference in the output.
>
> Am I bumping some sort of limitation within git, or have I uncovered a bug?

The former.  Unfortunately this comes from an old design
decision.

Fortunately this design decision is not something irreversible
(see Chapter 1 of Documentation/ManagementStyle in the kernel
repository ;-).

The packfile is a dual-use format.  When used for network
transfer, we only send the .pack file and have the recipient
reconstruct the corresponding .idx file.  When used locally, we
need both .pack and .idx file; .pack contains the meat of the
data, and .idx allows us random access to the objects stored in
the corresponding .pack file.

What is interesting is that .pack format does not have (as far
as I know) inherent size limitation.  However, .idx file has
hardcoded 32-bit offsets into .pack -- hence, in practice, you
cannot use a .pack that is over 4GB locally.

One crude workaround that would work _today_ for your situation
without changing file formats would be to use git-fetch into an
empty repository (and do ref cloning by hand) instead of using
git-clone.  git-fetch gets .pack data over the wire and explode
the objects contained in the stream into individual objects (as
opposed to git-clone gets .pack data, stores it as a .pack and
tries to create corresponding .idx which in your case would bust
the 32-bit limit and fail).

This is from a private note I sent to Linus on Jun 26 2005 when
pack & idx pairs were initially introduced.

 - Design decision.  As before, you have assumption that nothing
   is longer than 2^32 bytes.  I am not unhappy with that
   restriction with individual objects (even their uncompressed
   size limited below 4GB or even 2GB is fine --- after all we
   are talking about a source control system).  I am however
   wondering if we would regret it later to have a packed file
   also limited to 4GB by having object_entry.offset "unsigned
   long" (and fwrite htonl'ed 4 bytes).  I personally do not
   have problem with this, but I can easily see HPA frowning on
   us.  He didn't like it when I said "in GIT world, file sizes
   and offsets are of type 'unsigned long'" some time ago.

I do not have a copy of a response from Linus to this point, but
if I recall things correctly, since then, the plan always has
been (1) to limit the size of individual packfiles to fit within
the idx limit and/or (2) extend the idx format to be able to
express offset over 2^32.  The latter is possible because idx
file is a local matter, used only for local accesses and does
not get set over the wire.

However, even if we revise the .idx file format, we have another
practical problem to solve.  Currently we assume that we can mmap
one packfile as a whole and do a random access into it.  This
needs to be changed so that we (perhaps optionally, only when
dealing with a huge packfile) mmap part of a .pack at a time.

I recall more recently (as opposed to the heated discussion
immediately after packfile was introduced June last year) we had
another discussion about people not being able to mmap huge
packfiles, and partial mmapping was one of the things that were
discussed there.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]