Re: Excessive mmap [was Git server eats all memory]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 11, 2010 at 11:47 AM, Ivan Kanis
<expire-by-2010-08-16@xxxxxxxx> wrote:
> Avery Pennarun <apenwarr@xxxxxxxxx> wrote:
>> Now, the kernel is supposed to be smart enough to release old pages
>> out of RSS if you stop using them; it's no different from what the
>> kernel does with any cached file data.  So it shouldn't be expensive
>> to mmap instead of just reading the file.
>
> How can the kernel release old pages? There does not seem to be anyway
> to tell it that it doesn't need a given memory block.

The kernel doesn't care whether you "need" it; it swaps out "needed"
pages all the time.

With a normal dirty memory page (allocated with malloc() or whatever),
the kernel will need to write it out to the swap file before it drops
it from RSS.  Then if the process needs to read/write that page in the
future, it'll have to read it back in from the swap file and increase
RSS again before it can be used.

With mmap'd files it's slightly different.  As long as the page hasn't
been modified (and as far as I know, git never writes to pages of
packfiles) then we already know that page is safe on disk.  So if the
kernel needs to "swap it out", it just drops it immediately from RSS
and doesn't do any I/O.  When/if the process needs to read/write the
page in the future, the kernel can swap it in the way it did in the
first place: from the original file.

If I understand correctly, all this means that the kernel on average
tries to drop mmap'd file pages out of RSS more than other kinds of
dirty pages, because swapping out mmap'd pages is cheaper.

If you think about it, if you do 'cat filename' in a loop, every new
'cat' process needs to load filename into memory. Of course the kernel
doesn't throw away the pages just because cat exits; it keeps a cache
of the file's pages in memory, and just feeds them to the next 'cat'
process when it starts.  So the kernel keeps stuff in memory even if
nobody is currently using it.  The surprising thing (at first) is that
the kernel is also happy to throw away pages even if you *are* using
them, as long as it can get them back.

Swapping is based on how frequently a page is used, not whether that
page is currently mapped into someone's address space.  (Disclaimer: I
haven't read the code.  Maybe it does give higher priority to pages
that are currently mapped.)

>>> Looking some more into it today the bulk of the memory allocation
>>> happens in write_pack_file in the following loop.
>>>
>>> for (; i < nr_objects; i++) {
>>>    if (!write_one(f, objects + i, &offset))
>>>        break;
>>>    display_progress(progress_state, written);
>>> }
>>>
>>> This eventually calls write_object, here I am wondering if the
>>> unuse_pack function is doing its job. As far as I can tell it writes a
>>> null in memory, that I think is not enough to reclaim memory.
>>
>> What do you mean by the "memory allocation" happens here?  How are you
>> measuring it?
>
> I run top and look at the RES column. I put a printf before and after
> the code block and watch the memory go up and up.

Yeah, that's not a very good way to do it.  The problem is that RSS is
*guaranteed* to go up in this location: you've just accessed an mmap'd
page you haven't used before.  That's not a bug.  Furthermore, if
multiple processes are mmap'ing the same pages, *all* those processes
might see their RSS go up, but it's the "same" pages, so that's not
actually taking twice the physical memory.

Unfortunately there are no really reliable ways to track this kind of
memory usage (as far as I know).  The tricks I often use are:

1)  while sleep 1; do free; done

2)  vmstat 1

Command #1 will show you what's happening to your physical RAM.  If
you run one git-repack, do you see the 'free' column decreasing by the
same amount as the RSS increases?  If you run two repacks at once,
does it increase as the sum of the two RSS columns, or just one of
them, or something else?

Command #2 will show you your blocks swapped in and out per second.
The interesting columns are si/so/bi/bo.

>> On the other hand, perhaps a more important question is: why does git
>> feel like it needs to generate entirely new packs for each person
>> doing a clone on your system?  Shouldn't it be reusing existing ones
>> and just streaming them straight out to the recipient?
>
> Ah interesting point. Two things make me suspect the mmap is not shared
> between processes. One is that mmap is done with the MAP_PRIVATE flag
> which according to the man page doesn't share between processes. The
> second is that the mmap is done on a temporary file created by
> odb_mkstemp, I don't believe the name is identical between the two
> processes.

MAP_PRIVATE is a little more complicated than that.  What it means is
that if one of the processes *writes* to one of the pages, the other
process won't see the changes.  But if nobody writes to the pages -
and I'm pretty sure nobody does - then the kernel won't just copy the
data for no reason, because it would be pointlessly inefficient.

That said, you're obviously experiencing bad behaviour, ie. it's not
working like it's supposed to, one way or another.  So you shouldn't
trust that your kernel, or git, or even my explanations are correct :)

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]