On Sun, Feb 14, 2010 at 9:10 PM, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > On Sun, 14 Feb 2010, Dmitry Potapov wrote: > >> On Sun, Feb 14, 2010 at 02:53:58AM +0100, Johannes Schindelin wrote: >> > On Sun, 14 Feb 2010, Dmitry Potapov wrote: >> > >> > > + if (strbuf_read(&sbuf, fd, 4096) >= 0) >> > >> > How certain are you at this point that all of fd's contents fit into >> > your memory? >> >> You can't be sure... In fact, we know mmap() also may fail for huge >> files, so can strbuf_read(). > > That's comparing oranges to apples. In one case, the address space runs > out, in the other the available memory. The latter is much more likely. "much more likely" is not a very qualitative characteristic... I would prefer to see numbers. My gut feeling is that it is not a problem in real use cases where Git is used as VCS and not storage for huge media files. In fact, we do not use mmap() on Windows or MacOS at all, and I have not heard that users of those platforms suffered much more from inability to store huge files than those who use Linux. I do not want to say that there is no difference, but it may not as large as you try to portray. In any case, my patch was to let people to test it and to see what impact it has and come up with some numbers. BTW, probably, it is not difficult to stream a large file in chunks (and it may be even much faster, because we work on CPU cache), but I suspect it will not resolve all issues with huge files, because eventually we need to store them in a pack file. So we need to develop some strategy how to deal with them. One way to deal with them is to stream directly into a separate pack. Still, it does not resolve all problems, because each pack file should be mapped into a memory, and this may be a problem for 32-bit system (or even 64-bit systems where a sysadmin set limit on amount virtual memory available a single program). The other way to handle huge files is to split them into chunks. http://article.gmane.org/gmane.comp.version-control.git/120112 Maybe there are other approaches. I heard some people tried to do something about it, but i have never interested in big files to look at this issue closely. > >> > And even if you could be certain, a hint is missing that >> > strbuf_read(), its name notwithstanding, does not read NUL-terminated >> > strings. Oh, and the size is just a hint for the initial size, and it >> > reads until EOF. That has to be said in the commit message. >> >> I did not add _any_ new code, including the above line. It was there >> before my patch. > > But that explanation does not answer my question, does it? I believe it did, or I did not understand your question. Dmitry -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html