Re: much slower query in production

Guillaume Cottenceau <gc@xxxxxx> · Wed, 26 Feb 2020 19:02:05 +0100

Justin Pryzby <pryzby 'at' telsasoft.com> writes:

> On Wed, Feb 26, 2020 at 05:17:21PM +0100, Guillaume Cottenceau wrote:
>> On production:
>> 
>> # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from multicards;
>>                                                                                    QUERY PLAN                                                                                    
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Seq Scan on multicards  (cost=0.00..1455177.30 rows=204548 width=12) (actual time=0.178..1694987.355 rows=204548 loops=1)
>>    SubPlan 1
>>      ->  Aggregate  (cost=7.07..7.08 rows=1 width=8) (actual time=8.283..8.283 rows=1 loops=204548)
>>            ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..7.05 rows=9 width=0) (actual time=1.350..8.280 rows=6 loops=204548)
>>                  Index Cond: (multicard_uid = multicards.uid)
>>                  Heap Fetches: 1174940
>>  Planning Time: 1.220 ms
>>  Execution Time: 1695029.673 ms
>
>> The execution time ratio is a huge 3700. I guess the Heap Fetches
>> difference is the most meaningful here;
>
> Yes, it's doing an "index only" scan, but not very effectively.
> Vacuum the tickets table to set relallvisible and see if that helps.
>
> If so, try to keep it better vacuumed with something like
> ALTER TABLE tickets SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005);

Thanks for your reply! The effect is huge:

# vacuum analyze tickets;
VACUUM
Time: 182850.756 ms (03:02.851)

# EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from multicards;
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on multicards  (cost=0.00..947739.65 rows=204548 width=12) (actual time=15.579..5210.869 rows=204548 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=4.59..4.60 rows=1 width=8) (actual time=0.025..0.025 rows=1 loops=204548)
           ->  Index Only Scan using tickets_multicard_uid on tickets  (cost=0.43..4.57 rows=8 width=0) (actual time=0.022..0.024 rows=6 loops=204548)
                 Index Cond: (multicard_uid = multicards.uid)
                 Heap Fetches: 8
 Planning Time: 71.469 ms
 Execution Time: 5223.408 ms
(8 rows)

Time: 5332.361 ms (00:05.332)

(and subsequent executions are below 1 second)

It is actually consistent with using a restored backup on the dev
computer, as my understanding is this comes out without any
garbage and like a perfectly vacuumed database. Btw do you have
any hint as to how to perform timings using production data which
are consistent with production? Backup/restore is maybe not the
way to go, but rather a block device level copy?

Since postgresql 8, I have to say I rely entirely on autovacuum,
and did not notice it could really run too infrequently for the
work and create such difference. I see in documentation a default
autovacuum_vacuum_scale_factor = 0.2, is that something that is
typically lowered globally, e.g. maybe on a fairly active system?
I am worried that changing that configuration for that table to
0.005 would fix this query and similar ones, but later I might
face the same situation on other tables. Or how would you elect
tables for a lowered value configuration?

Thanks!

-- 
Guillaume Cottenceau