Search Postgresql Archives

Re: help understanding explain output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Luca Ferrari wrote:
Hello,
I've got a doubt about partial indexes and the path chosen by the optimizer.
Consider this simple scenario:

CREATE TABLE p( pk serial NOT NULL , val2 text, val1 text, b boolean, PRIMARY KEY (pk) ); INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1,1000000), 'val1b', 'val2b', true ); INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1000001,2000000), 'val1Notb', 'val2Notb', false );
CREATE INDEX i_p_b ON p (b) WHERE b = true;
ANALYZE p;

So I create a table with 2-million rows, the first million with b = true and the second one with b = false.
Now doing an explain for a query that selects only on the b attribute I got:

EXPLAIN SELECT * FROM p WHERE b = false;
QUERY PLAN ------------------------------------------------------------
 Seq Scan on p  (cost=0.00..34706.00 rows=1000133 width=28)
   Filter: (NOT b)


So a sequential scan. I know that the optimizer will not consider an index if it is not filtering, but I don't understand exactly why in this case. In fact, considering that the above query could remove the first half data pages (where b = true), and considering that:

SELECT reltype, relval1, relpages, reltuples
FROM pg_class WHERE relval1 IN ('p', 'i_p_b');
reltype | relval1 | relpages | reltuples ---------+----------+----------+-----------
  615079 | p        |    14706 |     2e+06
       0 | i_p_b |     2745 |    999867

The sequential access requires 14706 pages, while using the index for filtering almost the half of those, we've got 2745 + 7353 = around 10000 pages. I've tried to change the index type to an hash, but the situation did not change. Even with enable_seqscan = off the above query is executed sequentially, but with a different initial cost:


EXPLAIN SELECT * FROM p WHERE b = false;
QUERY PLAN ----------------------------------------------------------------------------
 Seq Scan on p  (cost=10000000000.00..10000034706.00 rows=1000133 width=28)
   Filter: (NOT b)


And here comes the second doubt: since in both cases the planner is doing a sequential access, why the first case has an initial cost = 0 and this one has a cost of 1 million?
I'm getting lost here, I need some hint to understand what is happening.

I'm running
PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit


Thanks,
Luca

I don't see where you've told the engine anything about the "b" column? Naturally a boolean can only have two values, but you haven't indexed on column b, nor partitioned the table on the values in the "b" column. I don't think it can just _know_ that the first n records have b=true. What behaviour would you expect if you had alternated the inserts (or just alternated the primary keys by assigning them even and odd values in the inserts)?


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux