On Mon, Dec 9, 2019 at 3:39 PM Fahiz Mohamed <fahiz@xxxxxxxxxxx> wrote:
I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second)Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server.-> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
Rows Removed by Index Recheck: 54239131
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 2816
Heap Blocks: exact=24301 lossy=1875383
Planning time: 0.639 ms
Execution time: 22946.036 ms
That seems like a lot of lossy blocks. As I understand it, that means the system didn't have enough work_mem to fit all the references to the individual rows which perhaps isn't surprising when it estimates it needs 4.5 million rows and ends up with 41 million.
Do both DB instances have the same data? I ask because the two plans are rather different which makes me think that statistics about the data are not very similar. Are both configured the same, particularly for shared_buffers and work_mem, as well as the various planning cost parameters like random_page cost? If you can provide these plans again with explain( analyze, buffers ) this time? Did you check on the last time autovacuum ran in pg_stat_user_tables?