Re: [PATCH] don't use mmap() to hash files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 14, 2010 at 9:10 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> On Sun, 14 Feb 2010, Dmitry Potapov wrote:
>
>> On Sun, Feb 14, 2010 at 02:53:58AM +0100, Johannes Schindelin wrote:
>> > On Sun, 14 Feb 2010, Dmitry Potapov wrote:
>> >
>> > > + if (strbuf_read(&sbuf, fd, 4096) >= 0)
>> >
>> > How certain are you at this point that all of fd's contents fit into
>> > your memory?
>>
>> You can't be sure... In fact, we know mmap() also may fail for huge
>> files, so can strbuf_read().
>
> That's comparing oranges to apples. In one case, the address space runs
> out, in the other the available memory. The latter is much more likely.

"much more likely" is not a very qualitative characteristic... I would
prefer to see numbers. My gut feeling is that it is not a problem in
real use cases where Git is used as VCS and not storage for huge media
files.  In fact, we do not use mmap() on Windows or MacOS at all, and
I have not heard that users of those platforms suffered much more from
inability to store huge files than those who use Linux.  I do not want
to say that there is no difference, but it may not as large as you try
to portray. In any case, my patch was to let people to test it and to
see what impact it has and come up with some numbers.

BTW, probably, it is not difficult to stream a large file in chunks (and
it may be even much faster, because we work on CPU cache), but I suspect
it will not resolve all issues with huge files, because eventually we
need to store them in a pack file. So we need to develop some strategy
how to deal with them.

One way to deal with them is to stream directly into a separate pack.
Still, it does not resolve all problems, because each pack file should
be mapped into a memory, and this may be a problem for 32-bit system
(or even 64-bit systems where a sysadmin set limit on amount virtual
memory available a single program).

The other way to handle huge files is to split them into chunks.
http://article.gmane.org/gmane.comp.version-control.git/120112

Maybe there are other approaches. I heard some people tried to do
something about it, but i have never interested in big files to look
at this issue closely.

>
>> > And even if you could be certain, a hint is missing that
>> > strbuf_read(), its name notwithstanding, does not read NUL-terminated
>> > strings. Oh, and the size is just a hint for the initial size, and it
>> > reads until EOF. That has to be said in the commit message.
>>
>> I did not add _any_ new code, including the above line. It was there
>> before my patch.
>
> But that explanation does not answer my question, does it?

I believe it did, or I did not understand your question.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]