Re: Increased iowait and blk_read_time with higher shared_buffers

Samed YILDIRIM <samed@xxxxxxxxxx> · Wed, 14 Dec 2022 20:41:27 +0200

Hello Jordan,

You don't have to set %25 for the best performance. You need to test different values for your database. If I were you, I would 
try to enable huge pages. You probably will see better performance with bigger shared_buffers when you configure huge pages. -> https://www.postgresql.org/docs/14/kernel-resources.html#LINUX-HUGE-PAGES
set effective_io_concurrency to 200. But, you need to test to figure out the best value. It significantly depends on your disk's metrics/configuration
set random_page_cost to 2 and try to decrease it gradually until 1.2.
set effective_cache_size to 24GB
run pg_test_timing on the server to see the cost of asking time to the system. Because track_io_timing is enabled in your configuration file. If it is expensive, I would disable tracking io timing.

Note that I assumed that those resources/servers are reserved for PostgreSQL and there is no other service running on them.

Best regards.
Samed YILDIRIM

On Wed, 14 Dec 2022 at 20:12, Jordan Hurwich <jhurwich@xxxxxxxxxxxxxxxx> wrote:
Thanks for your thoughtful response Samed.
I'm familiar with the article you linked to, and part of my surprise is that with these 32GB RAM machines we're seeing better performance at 12.5% (4GB) than the commonly recommended 25% (8GB) of system memory for shared_buffers. Your notes about disk read stats from Postgres potentially actually representing blocks read from the OS cache make sense, I just imagined that Postgres would be better at managing the memory when it was dedicated to it via shared_buffers than the OS (obviously with some point of diminishing returns); and I'm still hoping there's some Postgres configuration change we can make that enables better performance through improved utilization of shared_buffers at the commonly recommended 25% of system memory.

You mentioned effective_cache_size, which we currently have set to 16GB (50% of system memory). Is it worth us experimenting with that value, if so would you recommend we try reducing it or increasing it? Are there other settings that we might consider to see if we can improve the utilization of shared_buffers at higher values like 8GB (25% of system memory)?  

On Wed, Dec 14, 2022 at 4:38 AM Samed YILDIRIM <samed@xxxxxxxxxx> wrote:
Hi Jordan,

Increased shared buffer size does not necessarily mean an increased performance.

Regarding the negative correlation between IOWait and shared_buffers' size; if you don't increase memory of the system, it is an expected result in my opinion. Because, PostgreSQL starts reserving a bigger portion of the system memory, and the OS cache size decreases respectively. Smaller OS cache can easily result with more disk access and higher IO demand and bigger IOWait.

As you can see in graphs, when you increase the size of shared_buffers, you see higher block hits and lower block reads. "hits" refers to the blocks that are already in shared_buffers. "reads" refers to the blocks that are not in shared_buffers and "read from disk". But, "read from disk" that you see in PostgreSQL's statistic catalogs doesn't mean all of those blocks were read from the disk. PostgreSQL requests data blocks, which are not already in shared_buffers, from the kernel. And, if the requested block is in the OS cache, the kernel provides it directly from the memory. No disk access, therefore, happens at all. And, you observe that through lower disk access (I/O) and lower IOWait on your operating system.

When you increase size of shared_buffers without increasing amount of the system memory and with or without decreasing effective_cache_size, PostgreSQL considers the possibility of the block to be requested on the memory lower than previous configuration. So, it creates execution plans with less index usages. Less index usage means more sequential scan. More sequential scan means more disk read. We already have less OS cache. And the system has to carry out more disk accesses.

As you can see, they are all connected. Setting shared_buffers higher than a threshold, which varies from database to database, actually decreases your performance.

To conclude, your results are expected results.

A useful resource to read: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
 ..... given the way PostgreSQL also relies on the operating system cache, it's
 unlikely you'll find using more than 40% of RAM to work better than a 
smaller amount.

Best regards.
Samed YILDIRIM

On Tue, 13 Dec 2022 at 02:29, Jordan Hurwich <jhurwich@xxxxxxxxxxxxxxxx> wrote:
Hi everyone,I'm writing to ask about a correlation I was surprised to observe on our PSQL machines (particularly read-only standbys) where increasing "shared_buffers" appears to result in increased pg_stat_database.blk_read_time and CPU iowait, which in turn seems to correlate with reduced throughput for our query-heavy services - details below. 

Is this expected, or are there configuration changes we might make to improve the performance at higher "shared_buffers" values?

Thanks, let me know if I can provide any more info,
Jordan
Tests and results - public Datadog dashboard here, screenshot attached:
Our beta system ("endor") was run with three different configurations over the ~30hrs from Dec 11 17:00 to Dec 13 0:00 (UTC)
The only changes between these deployments was the "shared_buffers" parameter for all PSQL instances (machine and configuration details below). 
"shared_buffers" = "4000MB" - from Dec 10 19:00 to Dec 11 20:00 UTC
"shared_buffers" = "8000MB" - from Dec 11 21:00 to Dec 12 13:30 UTC
"shared_buffers" = "14000MB" - from Dec 12, 14:30 to Dec 13, 0:00 UTC
The datadog dashboard shows our results including cpu divided by usage and the cache hit vs disk read ratio including blk_read_time (additional metrics were enabled at about Dec 11, 10am PST)
Our most query heavy service is our "Trends worker" for which the average worker duration is shown in the top-left graph
We expect the workload to be relatively constant throughout this period, particularly focusing on the standby instances (PQSL2 and PSQL3) where all read-only queries should be sent.
We see the lowest duration, i.e. best performance, most consistently with the lowest setting for shared_buffers, "4000MB"
As we increase shared_buffers we see increased iowait on the standby instances (PSQL2 and PSQL3) and increased blk_read_time (per pg_stat_database), in the bottom-most graphs as "blks_read_time".
Even though we also see a higher ratio of cache hits on those instances. Our graphs show the per second change in pg_stat_database.blks_read abd blks_hit (as "all_hit/s" and "all_read/s") and pg_statio_user_tables.heap_blks_read, heap_blks_hit, idx_blks_read, and idx_blks_hit
Cluster contains 3 PSQL nodes, all on AWS EC2 instances, postgresql.conf attached
Version: psql (PostgreSQL) 14.1
Machine: 
AWS "c6gd.4xlarge" (32GB RAM, 16 core 2.5 GHz, local storage 950 GB ssd)
uname -a: Linux ip-172-30-64-110 5.4.0-1038-aws #40-Ubuntu SMP Fri Feb 5 23:53:34 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
Replication via WAL:
Line configuration: PSQL1 (master), PSQL1 followed by PSQL2, PSQL2 followed by PSQL3
Managed by repmgr (version: repmgr 5.3.0), no failovers observed during timeframe of interest
Load balancing:
Managed by PGPool-II (version: 4.3.2 (tamahomeboshi)) on 3 AWS instances
All write queries go to master. All read-only queries go to standbys unless WAL on standby > 10MB, falling back to read from master as last resort