Re: Some tuning suggestions on a Red Hat 6.7 - PG 9.5.3 production environment

Pietro Pugni <pietro.pugni@xxxxxxxxx> · Mon, 14 Nov 2016 21:50:38 +0100

Dear Domenico,
I pushed a little hard on that because the virtualizer runs on a distributed system composed by 7 clusters with more than 100 cores and an enterprise storage. I know that usually effective_io_concurrency is set based on the number of disks available in a RAID configuration (minus the stripe disks), so I decided to push hard on this due to the nature of the host machine. I have no much information about that machine but I can investigate.

Anyway, the OOM configuration described in the official documentation prevents from killing the postmaster.

I’m still wondering about the other configuration parameters, if they are reasonable or can be tweaked to get more performance.

Thank you for your support
 Pietro

PS I made a typo in the configuration. max_connections is 20, not 30.

Il giorno 14 nov 2016, alle ore 18:36, domenico febbo <mimmopasticcio@xxxxxxxxx> ha scritto:

dear Pietro, 

are you sure about 
effective_io_concurrency = 30
could you please explain the type of disk storage? 

Il 14/Nov/2016 12:46, "Pietro Pugni" <pietro.pugni@xxxxxxxxx> ha scritto:
Dear list,I’m looking for some guidelines on how to optimize the configuration of a production database dedicated to a DWH application.
I run the application on different machines and have solved several issues since now but am struggling on a production environment running Red Hat 6.7 and PostgreSQL 9.5.3.
My application does a lot of reads and many writes (plain “SELECT … INTO” and “INSERT”, no “UPDATE”), but on a order of magnitude lower than the reads.
The work flow consists of two big blocks: an ETL phase and the workloads on the data imported during the ETL phase.

The biggest schema has about 1.2 billions of rows distributed over a ten of tables; many of those tables are partitioned and have indexes. At the moment the database stores two schemas but I plan to add other three schemas of similar size.

The machine is virtualized and has 8 CPUs at about 3GHz, 64GB of RAM and 5TB of storage. It runs on Red Hat 6.7, kernel 2.6.x

The configuration changes I made so far are:
max_connections = 30
shared_buffers = 32GB
work_mem = 256MB
maintenance_work_mem = 4GB
effective_io_concurrency = 30
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 48GB
default_statistics_target = 1000

autovacuum is on and the collation is ‘C’.

The first issue I faced was about maintenance_work_mem because I set it to 16GB and the server silently crashed during a VACUUM because I didn’t consider that it could take up to autovacuum_max_workers * maintenance_work_mem (roughly 48GB). So I lowered maintenance_work_mem to 4GB and it did work. Should I set maintenance_work_mem to a smaller value (1GB) after the ETL terminates or can I leave it at 4GB without degrading the overall performance?

The second issue emerged during a intensive parallel query. I implemented a splitter that parallelize certain kind of queries. There were 8 similar queries running that was working on 8 overall disjoined subsets of the same table; this table has roughly 4.5 millions of rows. These queries uses SELECT DISTINCT, ORDER BY, OVER (PARTITION BY … ORDER BY) and COALESCE(). At a certain point the server crashed and I found the following error in the logs: 

postgres server process was terminated by signal 9 killed

After some research, I found that probably it was the OOM killer. Running “dmesg” tells that effectively it was. Reading the documentation and this answer on SO ( http://stackoverflow.com/questions/16418173/psql-seems-to-timeout-with-long-queries ), I realized that probably the issue is due to a misconfiguration. The value I set for this pg instance don’t seem to be so wrong, except maybe from maintenance_work_mem. I will certainly disable OOM as suggested by the official docs ( https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT ) but was wondering if I could tune the configuration a little better. Can someone give me some more advices?

I run the same application with different data (and workload) on other machines, but they have different configurations (Ubuntu 16.0.4). On one of them I previously disabled the virtual memory overcommit and never experienced that issue, but the machine has 128GB of RAM.

I hope to have been clear enough.
Thank you everyone
 Pietro