On Tue, Jul 18, 2017 at 3:20 AM, Charles Nadeau <charles.nadeau@xxxxxxxxx> wrote: > Claudio, > > Find attached the iostat measured while redoing the query above > (iostat1.txt). sda holds my temp directory (noop i/o scheduler), sdb the > swap partition (cfq i/o scheduler) only and sdc (5 disks RAID0, noop i/o > scheduler) holds the data. I didn't pay attention to the load caused by 12 > parallel scans as I thought the RAID card would be smart enough to > re-arrange the read requests optimally regardless of the load. At one moment > during the query, there is a write storm to the swap drive (a bit like this > case: > https://www.postgresql.org/message-id/AANLkTi%3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com). My experience from that case (and few more) has led me to believe that Linux database servers with plenty of memory should have their swaps turned off. The Linux kernel works hard to swap out little used memory to make more space for caching active data. Problem is that whatever decides to swap stuff out gets stupid when presented with 512GB RAM and starts swapping out things like sys v shared_buffers etc. Here's the thing, either your memory is big enough to buffer your whole data set, so nothing should get swapped out to make room for caching. OR your dataset is much bigger than memory. In which case, making more room gets very little if it comes at the cost of waiting for stuff you need to get read back in. Linux servers should also have zone reclaim turned off, and THP disabled. Try running "sudo swapoff -a" and see if it gets rid of your swap storms. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance