On Tue, 3 Apr 2007, Linus Torvalds wrote: > > > On Tue, 3 Apr 2007, Nicolas Pitre wrote: > > > > > > Yeah. What happens is that inside the repo, because we do all the > > > duplicate object checks (verifying that there are no evil hash collisions) > > > even after fixing the memory leak, we end up keeping *track* of all those > > > objects. > > > > What do you mean? > > Look at what we have to do to look up a SHA1 object.. We create all the > lookup infrastructure, we don't *just* read the object. The delta base > cache is the most obvious one. It is caped to 16MB, so we're far from the 200+ MB count. > > I'm of the opinion that this patch is unnecessary. It only helps in > > bogus workflows to start with, and it makes the default behavior unsafe > > (unsafe from a paranoid pov, but still). And in the _normal_ workflow > > it should never trigger. > > Actually, even in the normal workflow it will do all the extra unnecessary > work, if only because the lookup costs of *not* finding the entry. > > Lookie here: > > - git index-pack of the *git* pack-file in the v2.6/linux directory (zero > overlap of objects) > > With --paranoid: > > 2.75user 0.37system 0:03.13elapsed 99%CPU > 0major+5583minor pagefaults > > Without --paranoid: > > 2.55user 0.12system 0:02.68elapsed 99%CPU > 0major+2957minor pagefaults > > See? That's the *normal* workflow. Zero objects found. 7% CPU overhead > from just the unnecessary work, and almost twice as much memory used. Just > from the index file lookup etc for a decent-sized project. 7% overhead over 2 second and a half of CPU which, _normally_, happens when cloning the whole thing over a network connection which, if you're lucky and have a 6mbps cable connection, will still be spread over 5 minutes of real time. And that is assuming that you're cloning a big project inside itself which wouldn't work anyway. Otherwise a big clone wound run index-pack in an empty repo where the lookup of exinsting object is zero. Remains git-fetch which should concern itself with much smaller packs pushing this overhead in the noise. > Now, in the KDE situation, the *unnecessary* lookups will be about ten > times more expensive, both on memory and CPU, just because the repository > is about 20x the size. Even with no actual hits. So? When would you really perform such an operation in a meaningful way? The memory usage worries me. I still cannot explain nor justify it. But the CPU overhead is certainly not of any concern in _normal_ usage scenarios, is it? If anything that might be a good test case for the newton-raphson pack lookup idea. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html