Re: Specific query taking time to process

Michael Lewis <mlewis@xxxxxxxxxxx> · Tue, 10 Dec 2019 13:15:08 -0700

On Mon, Dec 9, 2019 at 3:39 PM Fahiz Mohamed <fahiz@xxxxxxxxxxx> wrote:

I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second)

Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server.

     -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
        Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
        Rows Removed by Index Recheck: 54239131
        Filter: (NOT (hashed SubPlan 1))
        Rows Removed by Filter: 2816
        Heap Blocks: exact=24301 lossy=1875383

 Planning time: 0.639 ms
 Execution time: 22946.036 ms

https://paquier.xyz/postgresql-2/postgres-9-4-feature-highlight-lossyexact-pages-for-bitmap-heap-scan/ 

That seems like a lot of lossy blocks. As I understand it, that means the system didn't have enough work_mem to fit all the references to the individual rows which perhaps isn't surprising when it estimates it needs 4.5 million rows and ends up with 41 million.

Do both DB instances have the same data? I ask because the two plans are rather different which makes me think that statistics about the data are not very similar. Are both configured the same, particularly for shared_buffers and work_mem, as well as the various planning cost parameters like random_page cost? If you can provide these plans again with explain( analyze, buffers ) this time? Did you check on the last time autovacuum ran in pg_stat_user_tables?