Re: [PATCH] don't use mmap() to hash files

Dmitry Potapov <dpotapov@xxxxxxxxx> · Sun, 14 Feb 2010 22:56:46 +0300

On Sun, Feb 14, 2010 at 10:28 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>>
>> Concrete example: in one of my repositories, the average file size is
>> well over 2 gigabytes.
>
> Just to make extremely sure that you undertand the issue: adding these
> files on a computer with 512 megabyte RAM works at the moment. Can you
> guarantee that there is no regression in that respect _with_ your patch?

It may not work without enough swap space, and it will not pretty anyway
due to swapping. So, I see the following options:

1. to introduce a configuration parameter that will define whether to use
mmap() to hash files or not. It is a trivial change, but the real question
is what default value for this option (should we do some heuristic based
on filesize vs available memory?)

2. to stream files in chunks. It is better because it is faster, especially on
large files, as you calculate SHA-1 and zip data while they are in CPU
cache. However, it may be more difficult to implement, because we have
filters that should be apply to files that are put to the repository.

3. to improve Git to support huge files on computers with low memory.

I think #3 is a noble goal, but I do not have time for that. I can try to take
on #2, but it may take more time than I have now. As to #1, I am ready to
send the patch if we agree that is the right way to go...

I am open to any your suggestion. Maybe there are some options here..

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html