Re: Performance with high correlation in group by on PK

Jeff Janes <jeff.janes@xxxxxxxxx> · Mon, 28 Aug 2017 12:32:19 -0700

On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys <haramrae@xxxxxxxxx> wrote:
Hi all,

It's been a while since I actually got to use PG for anything serious,

but we're finally doing some experimentation @work now to see if it is

suitable for our datawarehouse. So far it's been doing well, but there

is a particular type of query I run into that I expect we will

frequently use and that's choosing a sequential scan - and I can't

fathom why.

This is on:

The query in question is:

select "VBAK_MANDT", max("VBAK_VBELN")

  from staging.etl00001_vbak

 group by "VBAK_MANDT";

This is the header-table for another detail table, and in this case

we're already seeing a seqscan. The thing is, there are 15M rows in

the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",

"VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,

we only have 1 at the moment!).

You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't currently detect and implement them automatically, but you can use a recursive CTE to get it to work.  There are some examples at https://wiki.postgresql.org/wiki/Loose_indexscan

Cheers,

Jeff