On 8 September 2017 at 00:23, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: > On Tue, Aug 29, 2017 at 1:20 AM, Alban Hertroys <haramrae@xxxxxxxxx> wrote: >> >> On 28 August 2017 at 21:32, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: >> > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys <haramrae@xxxxxxxxx> >> > wrote: >> >> >> >> Hi all, >> >> >> >> It's been a while since I actually got to use PG for anything serious, >> >> but we're finally doing some experimentation @work now to see if it is >> >> suitable for our datawarehouse. So far it's been doing well, but there >> >> is a particular type of query I run into that I expect we will >> >> frequently use and that's choosing a sequential scan - and I can't >> >> fathom why. >> >> >> >> This is on: >> >> >> >> >> >> The query in question is: >> >> select "VBAK_MANDT", max("VBAK_VBELN") >> >> from staging.etl00001_vbak >> >> group by "VBAK_MANDT"; >> >> >> >> This is the header-table for another detail table, and in this case >> >> we're already seeing a seqscan. The thing is, there are 15M rows in >> >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT", >> >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact, >> >> we only have 1 at the moment!). >> > >> > >> > You need an "index skip-scan" or "loose index scan". PostgreSQL doesn't >> > currently detect and implement them automatically, but you can use a >> > recursive CTE to get it to work. There are some examples at >> > https://wiki.postgresql.org/wiki/Loose_indexscan >> >> Thanks Jeff, that's an interesting approach. It looks very similar to >> correlated subqueries. >> >> Unfortunately, it doesn't seem to help with my issue. The CTE is >> indeed fast, but when querying the results from the 2nd level ov the >> PK with the CTE results, I'm back at a seqscan on pdw00002_vbak again. > > > Something like this works: > > create table foo as select trunc(random()*5) as col1, random() as col2 from > generate_series(1,100000000); > create index on foo (col1, col2); > vacuum analyze foo; > > > with recursive t as ( > select * from (select col1, col2 from foo order by col1 desc, col2 desc > limit 1) asdfsaf > union all > select > (select col1 from foo where foo.col1 < t.col1 order by col1 desc, col2 > desc limit 1) as col1, > (select col2 from foo where foo.col1 < t.col1 order by col1 desc, col2 > desc limit 1) as col2 > from t where t.col1 is not null > ) > select * from t where t is not null; > > It is pretty ugly that you need one subquery in the select list for each > column to be returned. Maybe someone can find a way to avoid that part. I > tried using lateral joins to get around it, but couldn't make that work. > > Cheers, > > Jeff Thanks Jeff. That does indeed look ugly. Since we're dealing with a 4GL language (FOCUS) that translates to SQL, I don't think we'll attempt your workaround, even though we can use SQL directly in our reports if we want to. But, I just remembered giving someone else in a similar situation some advice on this very list; Obviously, when my first primary key field is not very selective, I should change the order of the fields in the PK! But let's first enjoy the weekend. Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general