On Mon, Apr 13, 2020 at 9:58 AM Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
On Mon, Apr 13, 2020 at 09:46:22AM -0500, Don Seiler wrote:
> ==> /sys/kernel/mm/ksm/run <==
> 0
Was it off to begin with ?
If not, you can set it to "2" to "unshare" pages.
Yes we haven't changed this. It was already set to 0.
> ==> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag <==
> 1
So I'd suggest trying with this disabled.
My understanding was that THP is disabled anyway. What would this defrag feature be doing now?
I don't know if I ever fully understood the problem, but it sounds like at
least in your case it's related to large shared_buffers, and hugepages, which
cannot be swapped out.
Basically the problem is our DB host getting slammed with connections (even with pgbouncer in place). We see the CPU load spiking, and when we check "top" we regularly see "kswapd" at the top of the list. For example, just now kswapd is at 72 %CPU in top. The next highest is a postgres process at 6.6 %CPU.
Our shared_buffers is set to 32GB, and HugePages is set to 36GB:
# grep Huge /proc/meminfo
AnonHugePages: 0 kB
HugePages_Total: 18000
HugePages_Free: 1897
HugePages_Rsvd: 41
HugePages_Surp: 0
Hugepagesize: 2048 kB
# grep Huge /proc/meminfo
AnonHugePages: 0 kB
HugePages_Total: 18000
HugePages_Free: 1897
HugePages_Rsvd: 41
HugePages_Surp: 0
Hugepagesize: 2048 kB
Also FWIW this host is actually a VSphere VM. We're looking into any underlying events during these spikes as well.
Don.
-- Don Seiler
www.seiler.us
www.seiler.us