Hi, On 2019-07-18 16:01:46 -0400, Hugh Ranalli wrote: > I've been going by a couple of articles I found about interpreting > pg_buffercache ( > https://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers), > and so far shared buffers look okay. Our database is 486 GB, with shared > buffers set to 32 GB. The article suggests a query that can provide a > guideline for what shared buffers should be: > > SELECT > pg_size_pretty(count(*) * 8192) as ideal_shared_buffers > FROM > pg_class c > INNER JOIN > pg_buffercache b ON b.relfilenode = c.relfilenode > INNER JOIN > pg_database d ON (b.reldatabase = d.oid AND d.datname = > current_database()) > WHERE > usagecount >= 3; IMO that's not a meaningful way to determine the ideal size of shared buffers. Except for the case where shared buffers is bigger than the entire working set (not just the hot working set), it's going to give you completely bogus results. Pretty much by definition it cannot give you a shared buffers size bigger than what it's currently set to, given that it starts with the number of shared buffers. And there's plenty scenarios where you'll commonly see many frequently (but not most frequently) used buffers with a usagecount < 3 even = 0. If you e.g. have a shared_buffers size that's just a few megabytes too small, you'll need to throw some buffers out of shared buffers - that means the buffer replacement search will go through all shared buffers and decrement the usagecount by one, until it finds a buffer with a count of 0 (before it has decremented the count). Which means it's extremely likely that there's moments where a substantial number of frequently used buffers have a lowered usagecount (perhaps even 0). Therefore, the above query will commonly give you a lower number than shared buffers, if your working set size is *bigger* than shared memory. I think you can assume that shared buffers is too big if a substantial portion of buffers have relfilenode IS NOT NULL (i.e. are unused); at least if you don't continually also DROP/TRUNCATE relations. If there's a large fluctuation about which parts of buffercache has a high usagecount, then that's a good indication that very frequently new buffers are needed (because that lowers a good portion of buffers to usagecount 0). I've had decent success in the past getting insights with a query like: SELECT ceil(bufferid/(nr_buffers/subdivisions::float))::int AS part, to_char(SUM((relfilenode IS NOT NULL)::int) / count(*)::float * 100, '999D99') AS pct_used, to_char(AVG(usagecount), '9D9') AS avg_usagecount, to_char(SUM((usagecount=0)::int) / SUM((relfilenode IS NOT NULL)::int)::float8 * 100, '999D99') AS pct_0 FROM pg_buffercache, (SELECT 10) AS x(subdivisions), (SELECT setting::int nr_buffers FROM pg_settings WHERE name = 'shared_buffers') s GROUP BY 1 ORDER BY 1; which basically subdivides pg_buffercache's output into 10 parts (or use as much as fit comfortable in one screen / terminal). Here's e.g. the output of a benchmark (pgbench) running against a database that's considerably smaller than shared memory (15GB database, 1.5GB shared_buffers): ┌──────┬──────────┬────────────────┬─────────┐ │ part │ pct_used │ avg_usagecount │ pct_0 │ ├──────┼──────────┼────────────────┼─────────┤ │ 1 │ 100.00 │ 1.0 │ 42.75 │ │ 2 │ 100.00 │ .6 │ 47.85 │ │ 3 │ 100.00 │ .6 │ 47.25 │ │ 4 │ 100.00 │ .6 │ 47.52 │ │ 5 │ 100.00 │ .6 │ 47.18 │ │ 6 │ 100.00 │ .5 │ 48.47 │ │ 7 │ 100.00 │ .5 │ 49.00 │ │ 8 │ 100.00 │ .5 │ 48.52 │ │ 9 │ 100.00 │ .5 │ 49.27 │ │ 10 │ 100.00 │ .5 │ 49.58 │ │ 11 │ 99.98 │ .6 │ 46.88 │ │ 12 │ 100.00 │ .6 │ 45.23 │ │ 13 │ 100.00 │ .6 │ 45.03 │ │ 14 │ 100.00 │ .6 │ 44.90 │ │ 15 │ 100.00 │ .6 │ 46.08 │ │ 16 │ 100.00 │ .6 │ 44.84 │ │ 17 │ 100.00 │ .6 │ 45.88 │ │ 18 │ 100.00 │ .6 │ 46.46 │ │ 19 │ 100.00 │ .6 │ 46.64 │ │ 20 │ 100.00 │ .6 │ 47.05 │ └──────┴──────────┴────────────────┴─────────┘ As you can see usagecounts are pretty low overall. That's because the buffer replacement rate is so high, that the usagecount is very frequently reduced to 0 (to get new buffers). You can infer from that, that unless you add a lot of shared buffers, you're not likely going to make a huge difference (but if you set it 16GB, it'd obviously look much better). In contrast to that, here's pgbench running on a smaller database, that nearly fits into shared buffers (2GB DB, 1.5GB shared_buffers): ┌──────┬──────────┬────────────────┬─────────┐ │ part │ pct_used │ avg_usagecount │ pct_0 │ ├──────┼──────────┼────────────────┼─────────┤ │ 1 │ 100.00 │ 3.9 │ 1.45 │ │ 2 │ 100.00 │ 3.8 │ 1.34 │ │ 3 │ 100.00 │ 3.8 │ 1.69 │ │ 4 │ 100.00 │ 3.7 │ 1.96 │ │ 5 │ 100.00 │ 3.7 │ 2.01 │ │ 6 │ 100.00 │ 3.6 │ 2.23 │ │ 7 │ 100.00 │ 3.5 │ 2.60 │ │ 8 │ 100.00 │ 3.5 │ 2.27 │ │ 9 │ 100.00 │ 3.4 │ 2.82 │ │ 10 │ 100.00 │ 3.3 │ 2.92 │ │ 11 │ 100.00 │ 3.2 │ 3.43 │ │ 12 │ 100.00 │ 3.1 │ 3.41 │ │ 13 │ 100.00 │ 3.7 │ 1.91 │ │ 14 │ 100.00 │ 4.0 │ 1.09 │ │ 15 │ 100.00 │ 3.9 │ 1.39 │ │ 16 │ 100.00 │ 4.0 │ 1.22 │ │ 17 │ 100.00 │ 4.1 │ 1.16 │ │ 18 │ 100.00 │ 4.0 │ 1.19 │ │ 19 │ 100.00 │ 4.0 │ 1.29 │ │ 20 │ 100.00 │ 4.0 │ 1.42 │ └──────┴──────────┴────────────────┴─────────┘ As you can see, there's many fewer buffers that have a usagecount of 0 - that's because the buffer replacement rate is much lower (as most buffers are in shared buffers), and thus the usagecount has time to "increase" regularly. Here you can guess that even just increasing shared buffers slightly, would increase the cache hit ratio substantially. E.g. the same workload, but with shared_buffes increased to 1.6GB: ┌──────┬──────────┬────────────────┬─────────┐ │ part │ pct_used │ avg_usagecount │ pct_0 │ ├──────┼──────────┼────────────────┼─────────┤ │ 1 │ 100.00 │ 5.0 │ .00 │ │ 2 │ 100.00 │ 5.0 │ .00 │ │ 3 │ 100.00 │ 5.0 │ .00 │ │ 4 │ 100.00 │ 5.0 │ .00 │ │ 5 │ 100.00 │ 5.0 │ .00 │ │ 6 │ 100.00 │ 5.0 │ .00 │ │ 7 │ 100.00 │ 5.0 │ .00 │ │ 8 │ 100.00 │ 5.0 │ .00 │ │ 9 │ 100.00 │ 5.0 │ .00 │ │ 10 │ 100.00 │ 5.0 │ .00 │ │ 11 │ 100.00 │ 5.0 │ .00 │ │ 12 │ 100.00 │ 5.0 │ .00 │ │ 13 │ 100.00 │ 5.0 │ .00 │ │ 14 │ 100.00 │ 5.0 │ .00 │ │ 15 │ 100.00 │ 5.0 │ .00 │ │ 16 │ 100.00 │ 5.0 │ .00 │ │ 17 │ 100.00 │ 5.0 │ .00 │ │ 18 │ 100.00 │ 5.0 │ .00 │ │ 19 │ 93.27 │ 5.0 │ .00 │ │ 20 │ .00 │ (null) │ (null) │ └──────┴──────────┴────────────────┴─────────┘ Now, in reality things are rarely quite this neat - pgbench has a uniform access pattern, which isn't that common in the real world. I also suggest to monitor how the buffer hit ratio develops over time. E.g. by doing a query like SELECT datname, (blks_hit - blks_read)::float/NULLIF(blks_hit, 0)::float FROM pg_stat_database; although that's not perfect, because it gives you the ratio since the last time the stats have been reset, making it hard to see more recent changes. So you either need to reset the stats, or just compute the difference to what the values where when you wanted to start observing. E.g. DROP TABLE IF EXISTS pg_stat_database_snap;CREATE TEMPORARY TABLE pg_stat_database_snap AS SELECT * FROM pg_stat_database; SELECT datname, (blks_hit - blks_read)::float/NULLIF(blks_hit, 0)::float FROM ( SELECT datname, pd.blks_read - ps.blks_read AS blks_read, pd.blks_hit - ps.blks_hit AS blks_hit FROM pg_stat_database pd JOIN pg_stat_database_snap ps USING (datname) ) pd_diff; Greetings, Andres Freund