Justin,
Thanks for the extensive reading list, very educative.
After reading https://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ I was thinking that it could be a NUMA/THP-related problem.
Turning off THP solved the "swap storm" problem. Some queries are even 40% faster with THP off.
Then also turning off KSM improved performance by another 5%
I was seriously worried about this issue as we received today another server with 144GB of RAM.
I will try to post a little summary of all the suggestion I received via this thread later this week/early next week.
Thanks!
Charles
On Tue, Jul 18, 2017 at 8:01 PM, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
Did you look at disabling KSM and/or THP ?On Tue, Jul 18, 2017 at 02:13:58PM -0300, Claudio Freire wrote:
> On Tue, Jul 18, 2017 at 1:01 PM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote:
> > On Tue, Jul 18, 2017 at 6:20 AM, Charles Nadeau
> > <charles.nadeau@xxxxxxxxx> wrote:
> >> Claudio,
> >>
> >> At one moment
> >> during the query, there is a write storm to the swap drive (a bit like this
> >> case:
> >> https://www.postgresql.org/message-id/AANLkTi% ).3Diw4fC2RgTxhw0aGpyXANhOT% 3DXBnjLU1_v6PdA%40mail.gmail. com
> >> I can hardly explain it as there is plenty of memory on this server.
> >
> > That sounds a lot like NUMA zone_reclaim issues:
> >
> > https://www.postgresql.org/message-id/500616CB.3070408@ 2ndQuadrant.com
>
> I realize you have zone_reclaim_mode set to 0. Still, the symptoms are
> eerily similar.
sudo sh -c 'echo 2 >/sys/kernel/mm/ksm/run'
https://www.postgresql.org/message-id/20170524155855. GH31097%40telsasoft.com
https://www.postgresql.org/message-id/ CANQNgOrD02f8mR3Y8Pi= zFsoL14RqNQA8hwz1r4rSnDLr1b2Cw @mail.gmail.com
https://www.postgresql.org/message-id/ CAHyXU0y9hviyKWvQZxX5UWfH9M2LY vwvAOPQ_DUPva2b71t12g%40mail. gmail.com
https://www.postgresql.org/message-id/20130716195834. 8fe5c79249cb2ff0d4270b3e@ yahoo.es
https://www.postgresql.org/message-id/CAE_ gQfW3dBiELcOppYN6v%3D8%2B% 2BpEeywD7iXGw-OT3doB8SXO4_A% 40mail.gmail.com
https://www.postgresql.org/message-id/flat/1436268563235- 5856914.post%40n5.nabble.com# 1436268563235-5856914.post@n5. nabble.com
https://www.postgresql.org/message-id/CAL_ 0b1tJOZCx3Lo3Eve1RqGaT%2BJJ_ Q7w4pkJ87WfWwXbTugnxw@mail. gmail.com
https://www.postgresql.org/message-id/556E2068.7070007@ vuole.me
https://www.postgresql.org/message-id/1415981309.90631. YahooMailNeo%40web133205.mail. ir2.yahoo.com
https://www.postgresql.org/message-id/CAHyXU0yXYpCXN4% 3D81ZDRQu- oGzrcq2qNAXDpyz4oiQPPAGk4ew% 40mail.gmail.com
https://www.pythian.com/blog/performance-tuning-hugepages- in-linux/
http://structureddata.org/2012/06/18/linux-6- transparent-huge-pages-and- hadoop-workloads/
Justin
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/
http://charlesnadeau.blogspot.com/