Re: Slow planning time for simple query

Maksim Milyutin <milyutinma@xxxxxxxxx> · Sat, 9 Jun 2018 21:49:47 +0300

On hot standby I faced with the similar problem.

The following simple query:

SELECT array_to_json(array_agg(t)) from (
    select *
        from     main.message m
            join main.message_instance mi on m.message_id = mi.message_id
            join main.channel_type ct on mi.channel_type_id = 
ct.channel_type_id
        where   m.user_id = 2152595
            and ct.name = 'android'
            and m.user_delete_ts is null
        order by
            case
                when read_ts is null then 0
                else 1
            end,
            m.message_id desc
        limit 100
    ) t;

is planned 4.940 ms on master and *254.741* ms on standby. I tried to 
investigate the reasons of so long planning on replica and determined 
that index accesses on planning stage come to multiple heap scans.

Here is the execution plan statistics of query above:

 Aggregate (actual time=0.641..0.642 rows=1 loops=1)
   Buffers: shared hit=14
   ->  Subquery Scan on t (actual time=0.337..0.360 rows=2 loops=1)
         Buffers: shared hit=14
         ->  Limit (actual time=0.292..0.301 rows=2 loops=1)
               Buffers: shared hit=14
               ->  Sort (actual time=0.287..0.291 rows=2 loops=1)
                     Sort Key: (CASE WHEN (m.read_ts IS NULL) THEN 0 
ELSE 1 END), m.message_id DESC
                     Sort Method: quicksort  Memory: 27kB
                     Buffers: shared hit=14
                     ->  Nested Loop (actual time=0.157..0.219 rows=2 
loops=1)
                           Buffers: shared hit=14
                           ->  Seq Scan on channel_type ct (actual 
time=0.043..0.048 rows=1 loops=1)
                                 Filter: (name = 'android'::text)
                                 Rows Removed by Filter: 7
                                 Buffers: shared hit=1
                           ->  Nested Loop (actual time=0.098..0.148 
rows=2 loops=1)
                                 Buffers: shared hit=13
                                 ->  Index Scan using 
message_user_id_idx1 on message m (actual time=0.055..0.063 rows=2 loops=1)
                                       Index Cond: (user_id = 2152595)
                                       Buffers: shared hit=5
                                 ->  Index Scan using 
message_instance_pkey on message_instance mi (actual time=0.026..0.029 
rows=1 loops=2)
                                       Index Cond: ((message_id = 
m.message_id) AND (channel_type_id = ct.channel_type_id))
                                       Buffers: shared hit=8

The 14 accesses to buffer cache under query execution. But 
pg_statio_user_tables and pg_statio_user_indexes views show the 
different picture:

select
    schemaname, relname, indexrelname, idx_blks_hit, idx_blks_read,
pg_size_pretty(pg_relation_size(indexrelid::regclass))
from pg_statio_user_indexes
where idx_blks_hit + idx_blks_read > 0;

 schemaname |     relname      |     indexrelname      | idx_blks_hit | 
idx_blks_read | pg_size_pretty
------------+------------------+-----------------------+--------------+---------------+----------------
 main       | channel_type     | channel_type_pkey |            2 
|             0 | 16 kB
 main       | message_instance | message_instance_pkey |          666 
|             0 | 345 MB
 main       | message          | message_pkey |           56 
|             0 | 53 MB
 main       | message          | message_user_id_idx1 |            3 
|             0 | 17 MB

select
    schemaname, relname, heap_blks_read, heap_blks_hit
from pg_statio_user_tables
where heap_blks_read + heap_blks_hit + idx_blks_hit + idx_blks_read > 0;

 schemaname |     relname      | heap_blks_read | heap_blks_hit
------------+------------------+----------------+---------------
 main       | channel_type     |              0 |             3
 main       | message          |              0 |          8682
 main       | message_instance |              0 | 114922

(This experiment is carried out on test hot standby without parallel 
activities and with cleared system statistics)

The vacuum on problem tables (main.message and main.message_instance) on 
master node resolves the problem somehow but its often execution slows 
down all queries and generally increases IO.

Is there any case to overcome the problem or it's fundamental issue and 
necessary to rewrite the query to simplify planning?

--
Regards,
Maksim Milyutin