Re: Memory Leak executing small queries without closing the connection - FreeBSD

Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> · Sun, 13 Dec 2015 15:14:25 -0500

On Sun, 13 Dec 2015 20:09:04 +0100
Gerhard Wiesinger <lists@xxxxxxxxxxxxx> wrote:

> On 13.12.2015 18:17, Tom Lane wrote:
> > Gerhard Wiesinger <lists@xxxxxxxxxxxxx> writes:
> >>> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> >>> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
> >> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?
> > Just judging from the name of the function, I would bet this is a direct
> > result of having only 512M of swap configured.  As Bill already pointed
> > out, that's a pretty useless choice on a system with 32G of RAM.  As soon
> > as the kernel tries to push out any significant amount of idle processes,
> > it's gonna be out of swap space.  The numbers you show above prove that
> > it is almost out of free swap already.
> 
> The system wasn't designed by me, I wouldn't do it either that way. Does 
> swapoff help?

FreeBSD and Linux (and most modern OS) are designed to have swap,
and usually more swap than RAM. I have never heard a good reason for
not using swap, and the reasons I _have_ heard have always been by
people misinformed about how the OS works.

If someone has a _good_ explanation for why you wouldn't want any
swap on a DB server, I'd love to hear it; but everything I've heard
up till now has been speculation based on misinformation.

IOW: no, you should not turn swap off, you should instead allocate
the appropriate amount of swap space.

> > Also, while that 20G of "inactive" pages may be candidates for reuse,
> > they probably can't actually be reused without swapping them out ...
> > and there's noplace for that data to go.
> 
> There is no log in syslog (where postgres log) when 
> swap_pager_getswapspace is logged.
> 
> But why do we have 20G of Inactive pages? They are still allocated by 
> kernel or user space. As you can see below (top output) NON Postgres 
> processes are around 9G in virtual size, resident even lower. The system 
> is nearly idle, and the queries typically aren't active after one second 
> agin. Therefore where does the rest of the 11G of Inactive pages come 
> from (if it isn't a Postgres/FreeBSD memory leak)?
> I read that Postgres has it's own memory allocator:
> https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/
> Might that be an issue with double allocation/freeing and the "cheese 
> hole" topic with memory fragmentation?

If there were a memory leak in either FreeBSD or Postgres of the
seriousness you're describing that were as easy to trigger as you
claim, I would expect the mailing lists and other support forums
to be exploding in panic. Notice that they are not. Also, I still
don't see _ANY_ evidence of a leak. I see evidence that something
is happening that is trying to allocate a LOT of RAM, that isn't
available on your system; but that's not the same as a leak.

> https://www.opennet.ru/base/dev/fbsdvm.txt.html
>          inactive        pages not actively used by programs which are
>                          dirty and (at some point) need to be written
>                          to their backing store (typically disk).
>                          These pages are still associated with objects and
>                          can be reclaimed if a program references them.
>                          Pages can be moved from the active to the inactive
>                          queue at any time with little adverse effect.
>                          Moving pages to the cache queue has bigger
>                          consequences (note 1)

Correct, but, when under pressure, the system _will_ recycle those
pages to be available.

Tom might be correct in that the system thinks they are inactive
because it could easily push them out to swap, but then it can't
_actually_ do that because you haven't allocated enough swap, but
that doesn't match my understanding of how inactive is used. A
question of that detail would be better asked on a FreeBSD forum,
as the differences between different VM implementations can be
pretty specific and technical.

[snip]

> Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free
> Swap: 512M Total, 501M Used, 12M Free, 97% Inuse
> 
>    PID USERNAME    THR PRI NICE   SIZE    RES STATE   C TIME    WCPU COMMAND
> 77941 pgsql         5  20    0  7921M  7295M usem    7 404:32  10.25% 
> postgres
> 79570 pgsql         1  20    0  7367M  6968M sbwait  6 4:24   0.59% postgres

[snip about 30 identical PG processes]

> 32387 myusername    9  20    0   980M   375M uwait   5 69:03   1.27% node

[snip similar processes]

>    622 myusername    1  20    0   261M  3388K kqread  3 41:01   0.00% nginx

[snip similar processes]

Wait ... this is a combined HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory? I don't see any such evidence in any of
your emails, and (based on experience) I find it pretty likely that whatever
is running under node is doing something in a horrifically memory-inefficient
manner. Since you mention that you see nothing in the PG logs, that makes it
even more likely (to me) that you're looking entirely in the wrong place.

I'd be willing to bet a steak dinner that if you put the web server on a
different server than the DB, that the memory problems would follow the
web server and not the DB server.

-- 
Bill Moran

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general