On Tue, 25 Jun 2019 at 12:23, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
It's possible that the "administrative" queries are using up lots of your
shared_buffers, which are (also/more) needed by the customer-facing queries. I
would install pg_buffercache to investigate. Or, just pause the admin queries
and see if that the issue goes away during that interval ?
SELECT 1.0*COUNT(1)/sum(count(1))OVER(), COUNT(1), COUNT(nullif(isdirty,'f')), datname, COALESCE(c.relname, b.relfilenode::text), d.relname TOAST, 1.0*COUNT(nullif(isdirty,'f'))/count(1) dirtyfrac, avg(usagecount) FROM pg_buffercache b JOIN pg_database db ON b.reldatabase=db.oid LEFT JOIN pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) LEFT JOIN pg_class d ON c.oid=d.reltoastrelid GROUP BY 4,5,6 ORDER BY 1 DESC LIMIT 9;
I've been going by a couple of articles I found about interpreting pg_buffercache (https://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers), and so far shared buffers look okay. Our database is 486 GB, with shared buffers set to 32 GB. The article suggests a query that can provide a guideline for what shared buffers should be:
SELECTpg_size_pretty(count(*) * 8192) as ideal_shared_buffersFROMpg_class cINNER JOINpg_buffercache b ON b.relfilenode = c.relfilenodeINNER JOINpg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())WHEREusagecount >= 3;
This comes out to 25 GB, and even dropping the usage count to 1 only raises it to 30 GB. I realise this is only a guideline, and I may bump it to 36 GB, to give a bit more space.
I did run some further queries to look at usage (based on the same article), and most of the tables that have very high usage on all the buffered data are 100% buffered, so, if I understand it correctly, there should be little churn there. The others seem to have sufficient less-accessed space to make room for data that they need to buffer:
relname | buffered | buffers_percent | percent_of_relation-------------------------+----------+-----------------+---------------------position | 8301 MB | 25.3 | 99.2stat_position_click | 7359 MB | 22.5 | 76.5url | 2309 MB | 7.0 | 100.0pg_toast_19788 | 1954 MB | 6.0 | 49.3 (harvested_job)stat_sponsored_position | 1585 MB | 4.8 | 92.3location | 927 MB | 2.8 | 98.7pg_toast_20174 | 866 MB | 2.6 | 0.3 (page)pg_toast_20257 | 678 MB | 2.1 | 92.9 (position_index)harvested_job | 656 MB | 2.0 | 100.0stat_employer_click | 605 MB | 1.8 | 100.0usagecount >= 5relname | pg_size_pretty-------------------------+----------------harvested_job | 655 MBlocation | 924 MBpg_toast_19788 | 502 MBpg_toast_20174 | 215 MBpg_toast_20257 | 677 MBposition | 8203 MBstat_employer_click | 605 MBstat_position_click | 79 MBstat_sponsored_position | 304 kBurl | 2307 MBusagecount >= 3relname | pg_size_pretty-------------------------+----------------harvested_job | 656 MBlocation | 927 MBpg_toast_19788 | 1809 MBpg_toast_20174 | 589 MBpg_toast_20257 | 679 MBposition | 8258 MBstat_employer_click | 605 MBstat_position_click | 716 MBstat_sponsored_position | 2608 kBurl | 2309 MBusagecount >= 1relname | pg_size_pretty-------------------------+----------------harvested_job | 656 MBlocation | 928 MBpg_toast_19788 | 3439 MBpg_toast_20174 | 842 MBpg_toast_20257 | 680 MBposition | 8344 MBstat_employer_click | 605 MBstat_position_click | 4557 MBstat_sponsored_position | 86 MBurl | 2309 MB
If I'm misreading this, please let me know. I know people also asked about query plans and schema, which I'm going to look at next; I've just been knocking off one thing at at time.
Thanks,
Hugh