On 2020-07-21 at 19:31:48, Matt Parnell wrote: > Description: > > Using compression friendly files, in this case, multiple 99MB zero byte > filled files, a prankster or malicious actor in control of a popular > repository can cause all who clone or pull a given branch to take up far > more storage locally than displayed remotely on the git host - this is > not limited to GitHub, but would apply to literally any git host, > including just git+ssh. > My Example Repo: https://github.com/ilikenwf/git-zlib-bomb > > The size of the repo on GitHub and my local git server is 744K, while > the non-mirrored, cloned version with extracted objects is around 101GB. > > I argue this is a bit different than the "git bomb" by GitHub user > Katee, as hers was more focused on recursion and segfaults. In this > case, the attack focuses on git's use of zlib to compress the packs, > especially since it will store only one compressed copy of an > object/file in the repo, even if multiple copies of it exist. > > While Github does already abort pushes when it detects files over 100MB, > and warn for files below that size, it doesn't seem to investigate > cumulative size of the extracted pack on disk. Even so, this doesn't > really help with git itself, the application, as it means that hosts and > users elsewhere are still in danger and the only way to mitigate that I > can think of would be to have git store a cumulative size value that can > be used for warnings, or perhaps some logic that detects zero or > repetitively filled files that compress in a deceptive manner. > Steps To Reproduce: > > (Add details for how we can reproduce the issue) > > Create a new repo locally. > Create multiple 1MB to 99MB zero filled files in the repo to get a > total in the tens or hundreds of gigabytes, or beyond. I just used > > for i in $(seq 1 1035); do dd if=/dev/zero of=test$i bs=99M count=1; done > > Add them all to the repo and commit. > Push them to a remote git server, or clone them as only a mirror. > Compare the disk space utilized between the original repo, and the > mirrored repo. > > Impact > > While this does not give unauthorized access to an attacker, it could be > used to easily consume a large amount of any given developer's time and > storage space. I personally would be very angry if I took the time to > clone a repo, only to have it crash when it ran out of space, or occupy > hundreds or thousands of gigabytes. I agree this is inconvenient, but it's a problem that occurs with all compression. Literally any zip file or tarball can have this problem as well, and those are other formats customarily used for software interchange. I don't think this is a security problem because it's a known attribute of compression which is endemic to all compressed data. We don't consider the zip, gzip, and xz programs to have security vulnerabilities because they don't limit the size of the data they uncompress. There are cases where this _is_ a vulnerability, usually where some larger system fails because an attacker can feed it data and make it run out of resources. If you're developing such a system, then you definitely need to use a dedicated user with quotas or a limited-size overly (e.g., Docker) to restrict against exhausting resources, but you need to do that anyway, independent of whether you're using Git. So I don't consider this to intrinsically be a security problem in Git any more than it is in other general-purpose tools that uncompress data. -- brian m. carlson: Houston, Texas, US