Thank you very much for your prompt responses.
I have analysed more regarding this and found the long running query.
I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second)
Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server.
EXPLAIN ANALYZE
Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual time=10568.217..22945.971 rows=22 loops=1)
-> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual time=10568.198..22945.663 rows=22 loops=1)
Hash Cond: (node.id = prop.node_id)
-> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
Rows Removed by Index Recheck: 54239131
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 2816
Heap Blocks: exact=24301 lossy=1875383
-> Bitmap Index Scan on idx_alf_node_mdq (cost=0.00..646144.01 rows=20047144 width=0) (actual time=3232.067..3232.067 rows=44246360 loops=1)
Index Cond: ((store_id = 6) AND (type_qname_id = 240))
SubPlan 1
-> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2503.51..347403.58 rows=128379 width=8) (actual time=25.447..65.392 rows=5635 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=40765
-> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2471.41 rows=128379 width=0) (actual time=18.835..18.835 rows=239610 loops=1)
Index Cond: (qname_id = 251)
-> Hash (cost=3526.11..3526.11 rows=871 width=8) (actual time=0.045..0.045 rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3526.11 rows=871 width=8) (actual time=0.021..0.042 rows=23 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
Heap Fetches: 23
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..2.01 rows=15 width=8) (actual time=0.011..0.011 rows=1 loops=22)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 22
Planning time: 0.639 ms
Execution time: 22946.036 ms
Instance 2: (Only by testers - 250 request a day)
Nested Loop Semi Join (cost=6471.94..173560841.08 rows=2 width=8) (actual time=0.162..0.464 rows=17 loops=1)
-> Nested Loop (cost=6471.37..173560684.36 rows=45 width=16) (actual time=0.154..0.387 rows=17 loops=1)
-> HashAggregate (cost=3508.15..3516.80 rows=865 width=8) (actual time=0.041..0.047 rows=18 loops=1)
Group Key: prop.node_id
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3505.99 rows=866 width=8) (actual time=0.020..0.035 r
ows=18 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
Heap Fetches: 18
-> Index Scan using alf_node_pkey on alf_node node (cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1 loops=18)
Index Cond: (id = prop.node_id)
Filter: ((type_qname_id <> 145) AND (store_id = 6) AND (type_qname_id = 240) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 0
SubPlan 1
-> Materialize (cost=2962.65..397912.89 rows=158204 width=8) (actual time=0.001..0.009 rows=85 loops=17)
-> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2962.65..396503.87 rows=158204 width=8) (actual time=0.021..0.082 rows=
85 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=55
-> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2923.10 rows=158204 width=0) (actual time=0.015..0.015 rows=87 loops=
1)
Index Cond: (qname_id = 251)
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..34.32 rows=12 width=8) (actual time=0.004..0.004 rows=1 loop
s=17)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 17
Planning time: 0.623 ms
Execution time: 0.540 ms
Configurations are same in both servers.
Please advise me on this. Is there any configuration specifically I need to look like “work_mem”, “Shared_buffers”, “checkpoint_segment”, “effective_cache_size”, “enable_seqscan” and “checkpoint_compression_target”?
Thanks in advance.
Fahiz
On 9 Dec 2019, 19:03 +0000, Michael Lewis <mlewis@xxxxxxxxxxx>, wrote:
There is a specific search query I am running to get list of Documents and their metadata from several table in the DB.We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)Our current DB consists of 500GB of data and indexes. Most of the rows in table are consist of 454,078,915
With the fresh DB with the restore of the DATA without any indexes Search query performs relatively quick and most of the time its less than a second.But after 3 weeks of use of the DB it sudenly started to slowdown only for this perticular query and it takes 20+ seconds to respond. If I do a restore the DB again then it continues to work fine and the symptom pops out after 3 weeks time.You haven't been quite clear on the situation and your use case, but assuming this table has 454 million rows and experiences updates/deletes then this sounds like you may be having problems with autovacuum. Have you customized parameters to ensure it is running more frequently than default? How are you doing those data restores? Perhaps that process is cleaning up the accumulated bloat and you can run fine again for a while. Check pg_stat_user_tables for the last (auto)vacuum that ran, assuming you didn't just restore again and are expecting the issue to occur again soon.