Re: Access to postgresql query optimizer output

Craig Ringer <ringerc@xxxxxxxxxxxxx> · Tue, 30 Oct 2012 10:10:09 +0800

On 10/30/2012 12:33 AM, Peter Geoghegan wrote:
> On 29 October 2012 16:00, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
>> The re-writing does not occur at the level of SQL, but rather at the
>> level of an internal representation "parse tree".
> 
> Right. The post parse-analysis tree (i.e. the Query struct) is
> potentially rewritten by an explicit rewrite stage after
> parse-analysis but immediately before planning. No rewriting of plans
> actually occurs. This is part of the rules subsystem, which is
> generally considered to be a foot-gun. It *is* mostly just as if the
> query had been silently rewritten, and had its SQL magically changed.
> 
> Craig mentioned that two distinct queries can produce the same query
> plan. That's true, if a little academic. I guess the best example of
> that is with join syntax. Technically, these could be two distinct
> queries, in that the query trees would be substantively different
> (according to pg_stat_statements own definition of that, which is
> essentially that they're not bitwise identical in respect of their
> nodes' substantive fields):
> 
> select * from foo f join bar b on f.bid = b.bid;
> 
> select * from foo f, bar b where f.bid = b.bid;
> 
> However, after planning, they could indeed have identical plans.

I'm talking about more substantial transformations too. For example,
given sample data:

CREATE TABLE a ( id integer primary key, somedata text not null );
INSERT INTO a(id, somedata) VALUES
(1,'ham'),(2,'eggs'),(3,'spam'),(4,'putty');
CREATE TABLE b ( a_id integer not null references a(id));
INSERT INTO b VALUES (1),(3);

regress=> EXPLAIN SELECT somedata FROM a
          WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=44.50..73.41 rows=615 width=32)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.30 rows=1230 width=36)
   ->  Hash  (cost=42.00..42.00 rows=200 width=4)
         ->  HashAggregate  (cost=40.00..42.00 rows=200 width=4)
               ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(6 rows)

regress=> EXPLAIN SELECT somedata FROM a
          WHERE id IN (SELECT a_id FROM b);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=44.50..73.41 rows=615 width=32)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.30 rows=1230 width=36)
   ->  Hash  (cost=42.00..42.00 rows=200 width=4)
         ->  HashAggregate  (cost=40.00..42.00 rows=200 width=4)
               ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(6 rows)

or:

ALTER TABLE b ADD UNIQUE(a_id);

regress=> EXPLAIN SELECT somedata FROM a INNER JOIN b ON (a.id = b.a_id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE id IN (SELECT a_id FROM b);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)

regress=> EXPLAIN SELECT somedata FROM a WHERE EXISTS (SELECT 1 FROM b
WHERE b.a_id = a.id);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..17.58 rows=2 width=32)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=2 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..8.27 rows=1 width=36)
         Index Cond: (id = b.a_id)
(4 rows)

These are three very different ways to write the query, and they all
result in the same plan.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general