Re: Loose Index Scans by Planner?

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Fri, 24 Aug 2012 14:40:57 -0500

Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx> wrote:

> So, we know we have a way of doing a loose index scan with CTEs:
> 
> http://wiki.postgresql.org/wiki/Loose_indexscan

I tried this on a table in production with 23 million rows for a
column with 45 distinct values which is the high-order column of a
four-column index.  This ran in 445 ms first time and 2 ms on the
second and subsequent tries.  The equivalent SELECT DISTINCT ran in
30 seconds first time, and got down to 11.5 seconds after a few
runs.  So roughly two orders of magnitude faster with a cold cache
and three orders of magnitude faster with a warm cache.

That sure would be a nice optimization to have in the planner.

> But that got me wondering. The planner knows from pg_stats that
> col1 could have low cardinality. If that's the case, and a WHERE
> clause uses a two column index, and col2 is specified, why can't
> it walk each individual bucket in the two-column index, and use
> col2? So I forced such a beast with a CTE:
> 
> WITH RECURSIVE t AS (
>    SELECT min(col1) AS col1
>      FROM tablename
>    UNION ALL
>    SELECT (SELECT min(col1)
>              FROM tablename
>             WHERE col1 > t.col1)
>      FROM t
>     WHERE t.col1 IS NOT NULL
> )
> SELECT p.*
>    FROM t
>    JOIN tablename p USING (col1)
>   where p.col2 = 12345
> 
> I ask, because while the long-term fix would be to re-order the
> index to (col2, col1), this seems like a situation the planner
> could easily detect and compensate for. In our particular example,
> execution time went from 160ms to 2ms with the CTE rewrite.

Well, that'd be the icing on the cake.  I'd be overjoyed to get the
cake.  :-)

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance