Re: Query running longer

Laurenz Albe <laurenz.albe@xxxxxxxxxxx> · Fri, 02 Feb 2024 09:37:38 +0100

On Fri, 2024-02-02 at 10:14 +0530, veem v wrote:
> On Fri, 2 Feb 2024 at 02:43, Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:
> > On Fri, 2024-02-02 at 02:27 +0530, veem v wrote:
> > > We have the below query which is running for ~45 seconds on postgres aurora reader instance.
> > > I have captured the explain analyze. Want to understand, where exactly the resources are
> > > getting spent and if we can be able to optimize it further.
> > 
> > Aurora <> PostgreSQL, but here is what I can see:
> > 
> > - The index scan on SCHEMA1."TAB2" has to check 2 million extra
> >   rows because "work_mem" is too small.  Almost the complete time
> >   is spent there.
> > 
> > - You may be getting a bad plan, because the statistics on
> >   SCHEMA1.TAB4 are either out of date or not detailed enough,
> >   which makes PostgreSQL underestimate the result size.
> 
> As you mentioned below, So wondering how you got to know, if this is the step where majority of the DB resources get spent. And as total time the query ran was ~45 seconds, and out of that how much time it spent in this step, how can i get that? And to fix this one line with regards to TAB2, should we increase "work_mem" here?
> 
> - The index scan on SCHEMA1."TAB2" has to check 2 million extra
>   rows because "work_mem" is too small.  Almost the complete time
>   is spent there.
> 
>  ->  Parallel Bitmap Heap Scan on SCHEMA1.""TAB2"" TAB2  (cost=84860.50..13040301.00 rows=1175611 width=80) (actual time=713.054..26942.082 rows=956249 loops=5)"
> "                     Output: TAB2.TAB2_time, TAB2.PRI, TAB2.AID"
>   Recheck Cond: (TAB2.MID = 'XXXXX'::numeric)
>   Rows Removed by Index Recheck: 2137395
>   Filter: ((TAB2.TAB2_time >= '2024-01-01 00:00:00+00'::timestamp with time zone) AND (TAB2.TAB2_time <= '2024-01-31 00:00:00+00'::timestamp with time zone))
>   Heap Blocks: exact=5300 lossy=782577
>   Buffers: shared hit=1651569 read=2245157
>   I/O Timings: shared/local read=29063.286  

The bitmap index scan builds a bitmap, the size of which is limited by "work_mem".
If that is big enough, the bitmap will contain a bit for each table row, if not,
it only contains a bit per table row for some table blocks (the "exact" ones),
then it degrades to a bit per block (whe "lossy" ones).

For lossy blocks, *all* rows in the block have to be rechecked, which is
overhead.  Given that only a small part of the time (26942.082 - 713.054) * 5
is spent doing I/O (29063.286), I guess that the rest is spent processing
table rows.

> Another point you mentioned as below , for this , should we run vacuum analyze on the table TAB4?
> 
> - You may be getting a bad plan, because the statistics on
>   SCHEMA1.TAB4 are either out of date or not detailed enough,
>   which makes PostgreSQL underestimate the result size.
>   
>   ->  Parallel Bitmap Heap Scan on SCHEMA1.TAB4 TAB4  (cost=26.39..7042.63 rows=1049 width=37) (actual time=23.650..201.606 rows=27613 loops=5)
>        ->  Bitmap Index Scan on TAB4_idx1  (cost=0.00..25.95 rows=1784 width=0) (actual time=23.938..23.938 rows=138067 loops=1)
>            Index Cond: ((TAB4.TAB4_code)::text = 'XX'::text)
>            Buffers: shared hit=72

An ANALYZE might be enough.

If not, you can try to collect more detailed statistics for the column:

  ALTER TABLE SCHEMA1.TAB4 ALTER TAB4_code SET STATISTICS 1000;
  ANALYZE SCHEMA1.TAB4;

Yours,
Laurenz Albe