On Tue, 25 Jun 2019 at 12:23, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
What kernel? Version? OS?
Ubuntu 18.04; current kernel is 4.15.0-51-generic4
If Linux, I wonder if transparent hugepages or KSM are enabled ? It seems==> /sys/kernel/mm/ksm/run <==
possible that truncating the table is clearing enough RAM to mitigate the
issue, similar to restarting the DB.
tail /sys/kernel/mm/ksm/run /sys/kernel/mm/transparent_hugepage/khugepaged/defrag /sys/kernel/mm/transparent_hugepage/enabled /sys/kernel/mm/transparent_hugepage/defrag
https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com
0
==> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag <==
1
==> /sys/kernel/mm/transparent_hugepage/enabled <==
always [madvise] never
==> /sys/kernel/mm/transparent_hugepage/defrag <==
always defer defer+madvise [madvise] never
From my research in preparing for the upgrade, I understood transparent huge pages were a good thing, and should be enabled. Is this not correct?
11.2 would have parallel query, and enabled by default. Are there other
settings you've changed (or not changed)?
https://wiki.postgresql.org/wiki/Server_Configuration
I've just posted the parameters we are changing from the default in a previous reply, so I won't repeat them here unless you want me to.
It's possible that the "administrative" queries are using up lots of your
shared_buffers, which are (also/more) needed by the customer-facing queries. I
would install pg_buffercache to investigate. Or, just pause the admin queries
and see if that the issue goes away during that interval ?
Pausing the admin queries isn't an option in our environment, especially as the issue reveals itself over the course of days, not minutes or hours.
?column? | count | count | datname | coalesce | toast | dirtyfrac | avg
------------------------+---------+-------+-----------+-------------------------+----------------+----------------------------+--------------------
0.24904101286779650995 | 1044545 | 0 | mydb | position | | 0.000000000000000000000000 | 4.8035517857057379
0.16701241622795295199 | 700495 | 0 | mydb | stat_position_click | | 0.000000000000000000000000 | 1.9870234619804567
0.09935032779251879171 | 416702 | 6964 | mydb | pg_toast_19788 | harvested_job | 0.01671218280689797505 | 1.9346079452462431
0.06979762146872315533 | 292750 | 0 | mydb | url | | 0.000000000000000000000000 | 4.9627873612297182
0.03795774662998486745 | 159205 | 0 | mydb | stat_sponsored_position | | 0.000000000000000000000000 | 1.8412361420809648
0.02923155381784048663 | 122605 | 0 | mydb | pg_toast_20174 | page | 0.000000000000000000000000 | 3.0259532645487541
0.02755283459406156353 | 115564 | 0 | mydb | location | | 0.000000000000000000000000 | 4.9953532241874632
0.02015273698468076320 | 84526 | 1122 | mydb | harvested_job | | 0.01327402219435439983 | 4.9922154130090150
0.01913348905375406298 | 80251 | 0 | mydb | pg_toast_20257 | position_index | 0.000000000000000000000000 | 4.9880001495308470
------------------------+---------+-------+-----------+-------------------------+----------------+----------------------------+--------------------
0.24904101286779650995 | 1044545 | 0 | mydb | position | | 0.000000000000000000000000 | 4.8035517857057379
0.16701241622795295199 | 700495 | 0 | mydb | stat_position_click | | 0.000000000000000000000000 | 1.9870234619804567
0.09935032779251879171 | 416702 | 6964 | mydb | pg_toast_19788 | harvested_job | 0.01671218280689797505 | 1.9346079452462431
0.06979762146872315533 | 292750 | 0 | mydb | url | | 0.000000000000000000000000 | 4.9627873612297182
0.03795774662998486745 | 159205 | 0 | mydb | stat_sponsored_position | | 0.000000000000000000000000 | 1.8412361420809648
0.02923155381784048663 | 122605 | 0 | mydb | pg_toast_20174 | page | 0.000000000000000000000000 | 3.0259532645487541
0.02755283459406156353 | 115564 | 0 | mydb | location | | 0.000000000000000000000000 | 4.9953532241874632
0.02015273698468076320 | 84526 | 1122 | mydb | harvested_job | | 0.01327402219435439983 | 4.9922154130090150
0.01913348905375406298 | 80251 | 0 | mydb | pg_toast_20257 | position_index | 0.000000000000000000000000 | 4.9880001495308470
harvested_job is the rapidly growing "problematic" table I am talking about. page is the 355 GB table that gets referenced on the public searches. I'll google, but is there a place I should look to understand what I am seeing here? Also, Should pg_buffercache perhaps be run at the beginning and end of the week, to see if there is a significant difference?
Could you send query plan for the slow (customer-facing) queries?
https://wiki.postgresql.org/wiki/Slow_Query_Questions#EXPLAIN_.28ANALYZE.2C_BUFFERS.29.2C_not_just_EXPLAIN
I can, but can I ask why this would matter? I'm not looking to optimise the query (although I'm sure it could be; this is a legacy system with lots of barnacles). The problem is that the same query performs increasingly slowly over the course of a week, seemingly in sync with the rows with a large toast column added to one particular table (which, as I mentioned, isn't referenced by the query in question). Wouldn't the plan be the same at both the start of the week (when the problematic table is essentially empty) and at the end (when it is much larger)?
Thanks!
Hugh