Victor Leschuk <vleschuk@xxxxxxxxxxxxxxxx> wrote: > The thing is that git-cat-file keeps growing during work when running > in "batch" mode. See the figure attached: it is for cloning a rather > small repo (1 hour to clone about ~14000 revisions). However the clone > of a large repo (~280000 revisions) took about 2 weeks and > git-cat-file has outgrown the parent perl process several times > (git-cat-file - ~3-4Gb, perl - 400Mb). Ugh, that sucks. Even the 400Mb size of Perl annoys me greatly and I'd work on fixing it if I had more time. But I'm completely against adding this parameter to git-svn. git-svn is not the only "cat-file --batch" user, so this option is only hiding problems. The best choice is to figure out why cat-file is wasting memory. Disclaimer: I'm no expert on parts of git written in C, but perhaps the alloc.c interface is why memory keeps growing. > What was done: > * I have run it under valgrind and mtrace and haven't found any memory leaks > * Found the source of most number of memory reallocations (batch_object_write() function (strbuf_expand -> realloc)) - tried to make the streambuf object static and avoid reallocs - didn't help > * Tried preloading other allocators than standard glibc - no significant difference A few more questions: * What is the largest file that existed in that repo? * Did you try "MALLOC_MMAP_THRESHOLD_" with glibc? Perhaps setting that to 131072 will help, that'll force releasing larger chunks than that; but it might be moot if alloc.c is getting in the way. If alloc.c is the culprit, I would consider to transparently restart "cat-file --batch" once it grows to a certain size or after a certain number of requests are made to it. We can probably do this inside "git cat-file" itself without changing any callers by calling execve. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html