On 2025-01-15 at 08:07:30, Ginger Luo 罗江 wrote: > Hello, GIT gurus > > I have a large repository which is more than 30GB, when I clone it with git-for-windows (tried 2.47.1 and some old versions), it prompted me with "fatal: pack has bad object at offset xxxxx: inflate returned 1", usually died at "receiving objects" stage at about 4GiB. > Looks like it's same issue in https://github.com/git-for-windows/git/pull/2179 ; I think it's actually a different issue. That's the limit for individual large blobs, and pack size would be a different code path. I feel like, though, that lots of people have large repositories and we would have heard before if Git for Windows simply could not handle pack files over 4 GiB. I've CC'd Dscho, the Git for Windows maintainer, to verify that there's no known problems cloning large repositories. He's very capable and has seen a lot, so hopefully he can provide some good insight as to what might be going wrong. Do you have a single file that is larger than 4 GiB? That is known to have had some problems on Git for Windows in the past, and could theoretically be related to this. However, typically files are compressed in pack files, so I wouldn't expect a 4 GiB blob to cause a failure at 4 GiB in the pack file unless it was uncompressible (such as random data). Do you have any sort of public repository or test case that can reproduce this? That would really help us fix the problem or pin down what might be going on, since we could inspect it ourselves. For instance, on my system, a fresh bare clone of https://github.com/torvalds/linux.git is larger than 4 GiB, so it would be good to know if that fails for you as well, or if it works correctly. Are you using any sort of antivirus or firewall other than the default, or any sort of proxy, including any TLS man-in-the-middle device or corporate proxy? This sounds a lot like some piece of software or hardware trying to buffer all the data for inspection or tampering with the data. If so, can you please try to completely uninstall the software and reboot, or use a different network that doesn't have such a device on it? Note that simply disabling the software often does not fix the problem. If that works, then you should report a bug to the vendor of that software or device. > Seems like it's a "long" versus "size_t" problem and should be fixed long ago, but why it's still there? I was using 64bit git and 64bit windows server; Git was originally written for Unix systems, which, when 64-bit, are LP64. That means that `long`, `long long`, and pointers are 64-bit, and the appropriate way to write a word-sized integer is `long`. This has been the case since the DEC Alpha, which was one of the first 64-bit machines. However, Windows decided to use an LLP64 approach, where `long` is 32-bit and `long long` and pointers are 64-bit. They did have reasons for doing so, such as compatibility with existing software that specified `long` as a 32-bit type, but it is incompatible with the rest of the world, and they knew that and did it anyway. This is a common source of portability problems when porting between OSes in both directions. Fortunately, newer languages like Rust and Go have avoided that problem. Because Windows was not originally a target for Git, there's a lot of code which still uses `long` in places, and nobody has cared enough to send patches fixing this. The required changes to the code can be pretty invasive, and so patches have to be carefully crafted to be reasonable in size and avoid conflicts with other series in flight. Of course, you (and anyone else) are welcome to contribute and everyone would be happy if you improved things in this regard, even if only a little bit. Note that `size_t` does not have to be 64-bit on 64-bit systems, only large enough to hold the largest possible allocation, and POSIX only requires that it be 16 bits in size, so it is not always suitable to use it when you want a word-sized unsigned integer. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature