On Fri, Oct 13, 2017 at 12:37 PM, Mike Hommey <mh@xxxxxxxxxxxx> wrote: > On Fri, Oct 13, 2017 at 12:26:46PM +0200, Christian Couder wrote: >> >> After cloning it with -n, there is the following "funny" situation: >> >> $ time git rev-list HEAD >> 7af99c9e7d4768fa681f4fe4ff61259794cf719b >> 18ed56cbc5012117e24a603e7c072cf65d36d469 >> 45546f17e5801791d4bc5968b91253a2f4b0db72 >> >> real 0m0.004s >> user 0m0.000s >> sys 0m0.004s >> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0/f0 >> >> real 0m0.004s >> user 0m0.000s >> sys 0m0.000s >> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/d0/d0 >> >> real 0m0.004s >> user 0m0.000s >> sys 0m0.000s >> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/d0/d0/d0/ >> 45546f17e5801791d4bc5968b91253a2f4b0db72 >> >> real 0m0.005s >> user 0m0.008s >> sys 0m0.000s >> $ time git rev-list HEAD -- d0/d0/d0/d0/d0/ >> 45546f17e5801791d4bc5968b91253a2f4b0db72 >> >> real 0m0.203s >> user 0m0.112s >> sys 0m0.088s >> $ time git rev-list HEAD -- d0/d0/d0/d0/ >> 45546f17e5801791d4bc5968b91253a2f4b0db72 >> >> real 0m1.305s >> user 0m0.720s >> sys 0m0.580s >> $ time git rev-list HEAD -- d0/d0/d0/ >> 45546f17e5801791d4bc5968b91253a2f4b0db72 >> >> real 0m12.135s >> user 0m6.700s >> sys 0m5.412s >> >> So `git rev-list` becomes exponentially more expensive when you run it >> on a shorter directory path, though it is fast if you run it without a >> path. > > That's because there are 10^7 files under d0/d0/d0, 10^6 under > d0/d0/d0/d0/, 10^5 under d0/d0/d0/d0/d0/ etc. > > So really, this is all about things being slower when there's a crazy > number of files. Picture me surprised. > > What makes it kind of special is that the repository contains a lot of > paths/files, but very few objects, because it's duplicating everything. > > All the 10^10 blobs have the same content, all the 10^9 trees that point > to them have the same content, all the 10^8 trees that point to those > trees have the same content, etc. > > If git wasn't effectively deduplicating identical content, the repository > would be multiple gigabytes large. Yeah, but perhaps Git could be smarter when rev-listing too and avoid processing files or directories it has already seen?