Re: Excessive mmap [was Git server eats all memory]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 9, 2010 at 12:34 PM, Ivan Kanis
<expire-by-2010-08-14@xxxxxxxx> wrote:
> Dmitry Potapov <dpotapov@xxxxxxxxx> wrote:
>> On 64-bit architecture, you have plenty virtual space, and mapping
>> a file to memory should not take much physical memory (only space
>> needed for system tables).
>
> What I can tell from the mmap man page is that it should map memory to a
> file. I assume it shouldn't take up physical memory. However I am seeing
> physical memory being consumed. It might be a feature of the kernel. Is
> there a way to turn it off?

'ps axu' will show two columns: VSIZE and RSS.  The only one that
actually matters is RSS.

When you mmap a file, it will immediately consume a lot of VSIZE - but
this won't affect your available system memory, because you have only
consumed "virtual" memory.  Instead of swapping that memory out to the
swap file, the kernel knows that this chunk of virtual memory is
already on disk - inside the mmap'd file.

When you access some of the pages of the mmap'd file, the kernel will
swap those pages into memory, which increases RSS.  This uses *real*
memory on the system.

As git generates a new pack file, it needs to access every single page
of every single pack that it's reading from, so eventually, all the
stuff you need will get sucked into RSS, so you'll see that number
grow and grow.  If your packfiles are huge, this is a lot of memory.

Now, the kernel is supposed to be smart enough to release old pages
out of RSS if you stop using them; it's no different from what the
kernel does with any cached file data.  So it shouldn't be expensive
to mmap instead of just reading the file.

> Looking some more into it today the bulk of the memory allocation
> happens in write_pack_file in the following loop.
>
> for (; i < nr_objects; i++) {
>    if (!write_one(f, objects + i, &offset))
>        break;
>    display_progress(progress_state, written);
> }
>
> This eventually calls write_object, here I am wondering if the
> unuse_pack function is doing its job. As far as I can tell it writes a
> null in memory, that I think is not enough to reclaim memory.

What do you mean by the "memory allocation" happens here?  How are you
measuring it?

unuse_pack indeed doesn't free any memory; it just zeroes a pointer
and decreases a refcount.  I don't know much about this code, but I
assume something else goes and cleans up the mmaps later.

In any case, mmap/munmap have little to do with your "real" memory
usage.  munmap() won't free any actual kernel memory; the used pages
will still be floating around in disk cache.

> I also looked at the use_pack function where the mmap is
> happening. Would it be worth refactoring this function so that it uses
> an index withing a file instead of mmap?
>
> Unless I hear of a better idea I'll be trying that tomorrow...

I wouldn't expect this to help, but I would be interested to hear if it does.

If the problem is simply that you're flooding the kernel disk cache
with data you'll use only once, to the detriment of everything else on
the system, then one thing that might help could be posix_fadvise:

    posix_fadvise(fd, ofs, len, POSIX_FADV_DONTNEED);

bup uses this when backing up huge files, since it knows it's only
going to use each block once, and this seemed to decrease system load
(without affecting bup's own performance) in some test cases.
However, it uses this for filesystem files, not packs, so it's a
different use case.

On the other hand, perhaps a more important question is: why does git
feel like it needs to generate entirely new packs for each person
doing a clone on your system?  Shouldn't it be reusing existing ones
and just streaming them straight out to the recipient?

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]