On Fri, 6 Jul 2007, David Kastrup wrote: > > Well, hmph! I just rewrote my git-diff-using script to not check > stuff into a throw-away git repository, and guess what: with real-life > use cases (diffing trees of about 500MB size), git-diff runs out of > memory (the machine probably has something like 1.5GB of virtual memory > size) when operating outside of a git repository. Ok, that's probably some huge memory leak that just doesn't show up with any normal git operations, likely simply because all the normal git operations will have thrown out the case of "identical files" without ever even looking at the file. I'd guess that when using the diff logic on outside files, we'll read them all in, compare them, and keep them all in memory even though they are identical. Generally, though, "git diff" has a much higher memory footprint than any normal file-by-file recursive diff, exactly because of the rename logic. An external "diff" won't ever have any reason to keep more than two files in memory at a time, but because git diff does rename and copy detection, it wants to keep the file data in memory over much longer times. But I bet there is some stupid bug where we just make it much much worse for the "no git tree/index" case, and keep the whole tree in memory or something. (The same is true of "git apply", btw, for a different reason: because git-apply will refuse to write out partial results in case some later patch fails, git-apply will keep the whole result in memory until the very end, and then do the write-out in one go. Again, that obviously means that it will potentially use a lot more memory than the "one patch at a time" approach that regular "patch" does) Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html