On Wed, Jun 4, 2014 at 5:27 PM, vlasmarias <vlasmarias@xxxxxxxxxxx> wrote:
For the past few days, we've been seeing unexpected extremely high CPU spikes
in our system. We observed the following: the 'free' memory would go down to
lower than 300 MB; at that point, 'cached' slowly starts to go down, and
then CPU starts to go way up.
It's almost as if the OS was not releasing 'cached' memory fast enough for
Postgres. Is that analysis correct? Is there a way to fix this?
This sounds like a kernel problem, probably either the zone reclaim issue, or the transparent huge pages issue.
I don't know the exact details off the top of my head, but both have been discussed a lot on both this list and the pgsql-hackers list.
Here's the session:
04:58:37 load average: 2.37, free: 532, cached: 22852
04:58:57 load average: 1.91, free: 451, cached: 22859
04:59:17 load average: 1.82, free: 469, cached: 22866
04:59:57 load average: 1.57, free: 387, cached: 22884
What tool is that? I'm not familiar with this output format.
max_connections | 500
While this is probably fundamentally a kernel problem, you are not doing yourself any favors by allowing 500 connections to a machine with 24 cores. High numbers of connections can trigger poor kernel behavior.
Cheers,
Jeff