Le 15/02/2011 15:49, Luca Ferrari a écrit : > Hello, > I've got a doubt about partial indexes and the path chosen by the optimizer. > Consider this simple scenario: > > CREATE TABLE p( pk serial NOT NULL , val2 text, val1 text, b boolean, PRIMARY > KEY (pk) ); > INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1,1000000), 'val1b', > 'val2b', true ); > INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1000001,2000000), > 'val1Notb', 'val2Notb', false ); > CREATE INDEX i_p_b ON p (b) WHERE b = true; > ANALYZE p; > > So I create a table with 2-million rows, the first million with b = true and > the second one with b = false. > Now doing an explain for a query that selects only on the b attribute I got: > > EXPLAIN SELECT * FROM p WHERE b = false; > QUERY PLAN > ------------------------------------------------------------ > Seq Scan on p (cost=0.00..34706.00 rows=1000133 width=28) > Filter: (NOT b) > > > So a sequential scan. I know that the optimizer will not consider an index if > it is not filtering, but I don't understand exactly why in this case. In fact, > considering that the above query could remove the first half data pages (where > b = true), and considering that: > > SELECT reltype, relval1, relpages, reltuples > FROM pg_class WHERE relval1 IN ('p', 'i_p_b'); > reltype | relval1 | relpages | reltuples > ---------+----------+----------+----------- > 615079 | p | 14706 | 2e+06 > 0 | i_p_b | 2745 | 999867 > > The sequential access requires 14706 pages, while using the index for filtering > almost the half of those, we've got 2745 + 7353 = around 10000 pages. Accessing a page in an index is way costier then accessing a page in an table with a sequential scan. By default, random_page_cost is 4 times seq_page_cost. So it's not really surprising that when you want to get half the table, PostgreSQL won't use the index. You would need to have a really selective query to make an index scan interesting to use. > I've tried to change the index type to an hash, but the situation did not > change. Even with enable_seqscan = off the above query is executed > sequentially, but with a different initial cost: > > > EXPLAIN SELECT * FROM p WHERE b = false; > QUERY PLAN > ---------------------------------------------------------------------------- > Seq Scan on p (cost=10000000000.00..10000034706.00 rows=1000133 width=28) > Filter: (NOT b) > > And here comes the second doubt: since in both cases the planner is doing a > sequential access, why the first case has an initial cost = 0 and this one has > a cost of 1 million? When you disable enable_seqscan, you actually say to the planner to add a really big number to the estimated cost of a seq scan. Which usually disables the use of the seqscan. Not in your case. -- Guillaume http://www.postgresql.fr http://dalibo.com -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general