Re: CPU Consuming query. Sequential scan despite indexing.

aditya desai <admad123@xxxxxxxxx> · Tue, 20 Oct 2020 18:26:08 +0530

On Mon, Oct 19, 2020 at 9:50 PM Michael Lewis <mlewis@xxxxxxxxxxx> wrote:
Reply to the group, not just me please. Btw, when you do reply to the group, it is best practice on these lists to reply in-line and not just reply on top with all prior messages quoted.

Hi  Michael,
Please see below inline response. I tried all this on Dev env 2 vCPU and 8 GB RAM. Still waiting for the PST environment :( with better configuration.

On Sun, Oct 18, 2020 at 3:23 AM aditya desai <admad123@xxxxxxxxx> wrote:
I tried vacuum full and execution time came down to half.
Great to hear.

However, it still consumes CPU. Setting parallel workers per gather to 0 did not help much.
You didn't answer all of my questions, particularly about disabling sequential scan. If you still have the default random_page_cost of 4, it might be that 1.5 allows better estimates for cost on index (random) vs sequential scan of a table. 

Please see the next inline answer. 

Laurenz is a brilliant guy. I would implement the indexes he suggests if you don't have them already and report back. If the indexes don't get used, try set enable_seqscan = false; before the query and if it is way faster, then reduce random_page_cost to maybe 1-2 depending how your overall cache hit ratio is across the system.

Query plan with enable_seqscan=off , Random page cost=1. With this execution time and cost of query is almost less than half compared to original settings. Also used the suggestions given by Laurenze. 1. Made use of UINON operator and created indexes. 

lmp_delivery_jobs=> explain (analyze,buffers) with JobCount as ( (select jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = 'TH'   and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') and j.jobstartdatetime  between '2020-08-01 00:00:00' and '2020-09-30 00:00:00'    group by j.jobstatuscode) UNION (select jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = 'TH'   and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') and j.jobstartdatetime is null  group by j.jobstatuscode))
 select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode;
                                                                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------
 Hash Right Join  (cost=68652.52..68652.76 rows=10 width=12) (actual time=676.477..676.495 rows=10 loops=1)
   Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text)
   Buffers: shared hit=11897
   CTE jobcount
     ->  HashAggregate  (cost=68650.01..68650.11 rows=10 width=24) (actual time=676.451..676.454 rows=8 loops=1)
           Group Key: j.jobstatuscode, (count(1))
           Buffers: shared hit=11895
           ->  Append  (cost=68645.89..68649.96 rows=10 width=24) (actual time=676.346..676.441 rows=8 loops=1)
                 Buffers: shared hit=11895
                 ->  Finalize GroupAggregate  (cost=68645.89..68648.17 rows=9 width=12) (actual time=676.345..676.379 rows=8 loops=1)
                       Group Key: j.jobstatuscode
                       Buffers: shared hit=11889
                       ->  Gather Merge  (cost=68645.89..68647.99 rows=18 width=12) (actual time=676.330..676.403 rows=24 loops=1)
                             Workers Planned: 2
                             Workers Launched: 2
                             Buffers: shared hit=29067 read=1
                             I/O Timings: read=0.038
                             ->  Sort  (cost=67645.87..67645.89 rows=9 width=12) (actual time=669.544..669.548 rows=8 loops=3)
                                   Sort Key: j.jobstatuscode
                                   Sort Method: quicksort  Memory: 25kB
                                   Worker 0:  Sort Method: quicksort  Memory: 25kB
                                   Worker 1:  Sort Method: quicksort  Memory: 25kB
                                   Buffers: shared hit=29067 read=1
                                   I/O Timings: read=0.038
                                   ->  Partial HashAggregate  (cost=67645.63..67645.72 rows=9 width=12) (actual time=669.506..669.511 rows=8 loops=3)
                                         Group Key: j.jobstatuscode
                                         Buffers: shared hit=29051 read=1
                                         I/O Timings: read=0.038
                                         ->  Parallel Index Scan using job_list_test1 on job j  (cost=0.43..66135.88 rows=301950 width=4) (actual time=0.040..442.373 ro
ws=244800 loops=3)
                                               Index Cond: (((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THP
KN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
                                               Filter: ((jobstartdatetime >= '2020-08-01 00:00:00'::timestamp without time zone) AND (jobstartdatetime <= '2020-09-30 00
:00:00'::timestamp without time zone))
                                               Buffers: shared hit=29051 read=1
                                               I/O Timings: read=0.038
                 ->  GroupAggregate  (cost=1.62..1.64 rows=1 width=12) (actual time=0.043..0.043 rows=0 loops=1)
                       Group Key: j_1.jobstatuscode
                       Buffers: shared hit=6
                       ->  Sort  (cost=1.62..1.62 rows=1 width=4) (actual time=0.041..0.041 rows=0 loops=1)
                             Sort Key: j_1.jobstatuscode
                             Sort Method: quicksort  Memory: 25kB
                             Buffers: shared hit=6
                             ->  Index Scan using job_list_test3 on job j_1  (cost=0.14..1.61 rows=1 width=4) (actual time=0.034..0.034 rows=0 loops=1)
                                   Index Cond: ((countrycode)::text = 'TH'::text)
                                   Filter: ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])
)
                                   Rows Removed by Filter: 26
                                   Buffers: shared hit=6
   ->  CTE Scan on jobcount jc  (cost=0.00..0.20 rows=10 width=24) (actual time=676.454..676.461 rows=8 loops=1)
         Buffers: shared hit=11895
   ->  Hash  (cost=2.29..2.29 rows=10 width=4) (actual time=0.015..0.015 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=2
         ->  Index Only Scan using jobstatus_jobstatuscode_unq on jobstatus js  (cost=0.14..2.29 rows=10 width=4) (actual time=0.005..0.009 rows=10 loops=1)
               Heap Fetches: 0
               Buffers: shared hit=2
 Planning Time: 0.812 ms
 Execution Time: 676.642 ms
(55 rows)

Query with Random page cost=4 and enable_seq=on

lmp_delivery_jobs=> set random_page_cost=4;
SET
lmp_delivery_jobs=> explain (analyze,buffers) with JobCount as ( (select jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = 'TH'   and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') and j.jobstartdatetime  between '2020-08-01 00:00:00' and '2020-09-30 00:00:00'    group by j.jobstatuscode) UNION (select jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = 'TH'   and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') and j.jobstartdatetime is null  group by j.jobstatuscode))
 select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode;

               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------
 Hash Right Join  (cost=128145.44..128145.67 rows=10 width=12) (actual time=1960.823..1960.842 rows=10 loops=1)
   Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text)
   Buffers: shared hit=20586 read=8706
   I/O Timings: read=49.250
   CTE jobcount
     ->  HashAggregate  (cost=128144.11..128144.21 rows=10 width=24) (actual time=1960.786..1960.788 rows=8 loops=1)
           Group Key: j.jobstatuscode, (count(1))
           Buffers: shared hit=20585 read=8706
           I/O Timings: read=49.250
           ->  Append  (cost=128135.68..128144.06 rows=10 width=24) (actual time=1960.634..1960.774 rows=8 loops=1)
                 Buffers: shared hit=20585 read=8706
                 I/O Timings: read=49.250
                 ->  Finalize GroupAggregate  (cost=128135.68..128137.96 rows=9 width=12) (actual time=1960.632..1960.689 rows=8 loops=1)
                       Group Key: j.jobstatuscode
                       Buffers: shared hit=20579 read=8706
                       I/O Timings: read=49.250
                       ->  Gather Merge  (cost=128135.68..128137.78 rows=18 width=12) (actual time=1960.616..1960.690 rows=24 loops=1)
                             Workers Planned: 2
                             Workers Launched: 2
                             Buffers: shared hit=58214 read=30130
                             I/O Timings: read=152.485
                             ->  Sort  (cost=127135.66..127135.68 rows=9 width=12) (actual time=1941.131..1941.134 rows=8 loops=3)
                                   Sort Key: j.jobstatuscode
                                   Sort Method: quicksort  Memory: 25kB
                                   Worker 0:  Sort Method: quicksort  Memory: 25kB
                                   Worker 1:  Sort Method: quicksort  Memory: 25kB
                                   Buffers: shared hit=58214 read=30130
                                   I/O Timings: read=152.485
                                   ->  Partial HashAggregate  (cost=127135.43..127135.52 rows=9 width=12) (actual time=1941.088..1941.092 rows=8 loops=3)
                                         Group Key: j.jobstatuscode
                                         Buffers: shared hit=58198 read=30130
                                         I/O Timings: read=152.485
                                         ->  Parallel Seq Scan on job j  (cost=0.00..125625.68 rows=301950 width=4) (actual time=0.015..1698.223 rows=244800 loops=3)
                                               Filter: ((jobstartdatetime >= '2020-08-01 00:00:00'::timestamp without time zone) AND (jobstartdatetime <= '2020-09-30 00
:00:00'::timestamp without time zone) AND ((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,
THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
                                               Rows Removed by Filter: 673444
                                               Buffers: shared hit=58198 read=30130
                                               I/O Timings: read=152.485
                 ->  GroupAggregate  (cost=5.93..5.95 rows=1 width=12) (actual time=0.077..0.077 rows=0 loops=1)
                       Group Key: j_1.jobstatuscode
                       Buffers: shared hit=6
                       ->  Sort  (cost=5.93..5.94 rows=1 width=4) (actual time=0.075..0.075 rows=0 loops=1)
                             Sort Key: j_1.jobstatuscode
                             Sort Method: quicksort  Memory: 25kB
                             Buffers: shared hit=6
                             ->  Index Scan using job_list_test3 on job j_1  (cost=0.14..5.92 rows=1 width=4) (actual time=0.065..0.065 rows=0 loops=1)
                                   Index Cond: ((countrycode)::text = 'TH'::text)
                                   Filter: ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])
)
                                   Rows Removed by Filter: 26
                                   Buffers: shared hit=6
   ->  CTE Scan on jobcount jc  (cost=0.00..0.20 rows=10 width=24) (actual time=1960.789..1960.797 rows=8 loops=1)
         Buffers: shared hit=20585 read=8706
         I/O Timings: read=49.250
   ->  Hash  (cost=1.10..1.10 rows=10 width=4) (actual time=0.023..0.023 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=1
         ->  Seq Scan on jobstatus js  (cost=0.00..1.10 rows=10 width=4) (actual time=0.007..0.013 rows=10 loops=1)
               Buffers: shared hit=1
 Planning Time: 3.019 ms
 Execution Time: 1961.024 ms

Auto vacuuming is catching up just fine. No issues in that area.
If the time came down by half after 'vacuum full', I would question that statement.

I checked the last autovacuum on underlying tables before load tests and it was very recent. Also I explicitly ran VACUUM ANALYZE FREEZ on underlying tables before load test just to make sure. It did not help much.

Temp table size is less that original tables without indexes.
Significantly less would indicate the regular table still being bloated I think. Maybe someone else will suggest otherwise.

Please see below.

 SELECT
relname AS TableName
,n_live_tup AS LiveTuples
,n_dead_tup AS DeadTuples
FROM pg_stat_user_tables where relname='job';
 tablename | livetuples | deadtuples
-----------+------------+------------
 job       |    2754980 |        168

Does this mean we need to upgrade the hardware? Also by caching data , do you mean caching at application side(microservices side) ? Or on postgres side? I tried pg_prewarm, it did not help much.
I can't say about hardware. Until you have exhausted options like configs and indexing, spending more money forever onwards seems premature. I meant pre-aggregated data, wherever it makes sense to do that. I wouldn't expect pg_prewarm to do a ton since you already show high cache hits.

Understood thanks. 

It is actually the CPU consumption which is the issue. Query is fast otherwise.
Sure, but that is a symptom of reading and processing a lot of data.

As per application team, it is business requirement to show last 60 days worth data. This particular query finds the counts of jobstatus(GROUP BY) which may be taking a lot of compute(CPU spikes) I have tried indexing suggested by Laurenze as well. Cost and execution time are still high