Re: large repository clone failure in git for windows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-01-15 at 08:07:30, Ginger Luo 罗江 wrote:
> Hello, GIT gurus
> 
> I have a large repository which is more than 30GB, when I clone it with git-for-windows (tried 2.47.1 and some old versions), it prompted me with "fatal: pack has bad object at offset xxxxx: inflate returned 1", usually died at "receiving objects" stage at about 4GiB.
> Looks like it's same issue in https://github.com/git-for-windows/git/pull/2179 ;

I think it's actually a different issue.  That's the limit for
individual large blobs, and pack size would be a different code path.

I feel like, though, that lots of people have large repositories and we
would have heard before if Git for Windows simply could not handle pack
files over 4 GiB.  I've CC'd Dscho, the Git for Windows maintainer, to
verify that there's no known problems cloning large repositories.  He's
very capable and has seen a lot, so hopefully he can provide some good
insight as to what might be going wrong.

Do you have a single file that is larger than 4 GiB?  That is known to
have had some problems on Git for Windows in the past, and could
theoretically be related to this.  However, typically files are
compressed in pack files, so I wouldn't expect a 4 GiB blob to cause a
failure at 4 GiB in the pack file unless it was uncompressible (such as
random data).

Do you have any sort of public repository or test case that can
reproduce this?  That would really help us fix the problem or pin down
what might be going on, since we could inspect it ourselves.  For
instance, on my system, a fresh bare clone of
https://github.com/torvalds/linux.git is larger than 4 GiB, so it would
be good to know if that fails for you as well, or if it works correctly.

Are you using any sort of antivirus or firewall other than the default,
or any sort of proxy, including any TLS man-in-the-middle device or
corporate proxy?  This sounds a lot like some piece of software or
hardware trying to buffer all the data for inspection or tampering with
the data.

If so, can you please try to completely uninstall the software and
reboot, or use a different network that doesn't have such a device on
it?  Note that simply disabling the software often does not fix the
problem.  If that works, then you should report a bug to the vendor of
that software or device.

> Seems like it's a "long" versus "size_t" problem and should be fixed long ago, but why it's still there? I was using 64bit git and 64bit windows server;

Git was originally written for Unix systems, which, when 64-bit, are
LP64.  That means that `long`, `long long`, and pointers are 64-bit, and
the appropriate way to write a word-sized integer is `long`.  This has
been the case since the DEC Alpha, which was one of the first 64-bit
machines.

However, Windows decided to use an LLP64 approach, where `long` is
32-bit and `long long` and pointers are 64-bit.  They did have reasons
for doing so, such as compatibility with existing software that
specified `long` as a 32-bit type, but it is incompatible with the rest
of the world, and they knew that and did it anyway.  This is a common
source of portability problems when porting between OSes in both
directions.  Fortunately, newer languages like Rust and Go have avoided
that problem.

Because Windows was not originally a target for Git, there's a lot of
code which still uses `long` in places, and nobody has cared enough to
send patches fixing this.  The required changes to the code can be
pretty invasive, and so patches have to be carefully crafted to be
reasonable in size and avoid conflicts with other series in flight.  Of
course, you (and anyone else) are welcome to contribute and everyone
would be happy if you improved things in this regard, even if only a
little bit.

Note that `size_t` does not have to be 64-bit on 64-bit systems, only
large enough to hold the largest possible allocation, and POSIX only
requires that it be 16 bits in size, so it is not always suitable to
use it when you want a word-sized unsigned integer.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux