On Fri, Oct 13, 2017 at 12:26:46PM +0200, Christian Couder wrote: > On Fri, Oct 13, 2017 at 12:06 PM, Mike Hommey <mh@xxxxxxxxxxxx> wrote: > > On Fri, Oct 13, 2017 at 12:51:58PM +0300, Constantine wrote: > >> There's a gitbomb on github. It is undoubtedly creative and funny, but since > >> this is a bug in git, I thought it'd be nice to report. The command: > >> > >> $ git clone https://github.com/x0rz/ShadowBrokersFiles > > > > What fills memory is actually the checkout part of the command. git > > clone -n doesn't fail. > > > > Credit should go where it's due: https://kate.io/blog/git-bomb/ > > (with the bonus that it comes with explanations) > > Yeah, there is a thread on Hacker News about this too: > > https://news.ycombinator.com/item?id=15457076 > > The original repo on GitHub is: > > https://github.com/Katee/git-bomb.git > > After cloning it with -n, there is the following "funny" situation: > > $ time git rev-list HEAD > 7af99c9e7d4768fa681f4fe4ff61259794cf719b > 18ed56cbc5012117e24a603e7c072cf65d36d469 > 45546f17e5801791d4bc5968b91253a2f4b0db72 > > real 0m0.004s > user 0m0.000s > sys 0m0.004s > $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0/f0 > > real 0m0.004s > user 0m0.000s > sys 0m0.000s > $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0 > > real 0m0.004s > user 0m0.000s > sys 0m0.000s > $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/ > 45546f17e5801791d4bc5968b91253a2f4b0db72 > > real 0m0.005s > user 0m0.008s > sys 0m0.000s > $ time git rev-list HEAD -- d0/d0/d0/d0/d0/ > 45546f17e5801791d4bc5968b91253a2f4b0db72 > > real 0m0.203s > user 0m0.112s > sys 0m0.088s > $ time git rev-list HEAD -- d0/d0/d0/d0/ > 45546f17e5801791d4bc5968b91253a2f4b0db72 > > real 0m1.305s > user 0m0.720s > sys 0m0.580s > $ time git rev-list HEAD -- d0/d0/d0/ > 45546f17e5801791d4bc5968b91253a2f4b0db72 > > real 0m12.135s > user 0m6.700s > sys 0m5.412s > > So `git rev-list` becomes exponentially more expensive when you run it > on a shorter directory path, though it is fast if you run it without a > path. That's because there are 10^7 files under d0/d0/d0, 10^6 under d0/d0/d0/d0/, 10^5 under d0/d0/d0/d0/d0/ etc. So really, this is all about things being slower when there's a crazy number of files. Picture me surprised. What makes it kind of special is that the repository contains a lot of paths/files, but very few objects, because it's duplicating everything. All the 10^10 blobs have the same content, all the 10^9 trees that point to them have the same content, all the 10^8 trees that point to those trees have the same content, etc. If git wasn't effectively deduplicating identical content, the repository would be multiple gigabytes large. Mike