On Tue, Feb 19, 2019 at 05:10:43PM -0700, Abi Noda wrote: > I have a table as defined below. The table contains 1,027,616 rows, 50,349 > of which have state='open' and closed IS NULL. Since closed IS NULL for all > rows where state='open', I want to remove the unnecessary state column. > > CREATE TABLE tickets ( > id bigserial primary key, > state character varying, > closed timestamp, ... > ); > > CREATE INDEX "state_index" ON "tickets" ("state") WHERE ((state)::text = > 'open'::text)); > > As part of the process of removing the state column, I am trying to index > the closed column so I can achieve equal query performance (index scan) as > when I query on the state column as shown below: > > EXPLAIN ANALYZE select title, created, closed, updated from tickets where state = 'open'; > Index Scan using state_index on tickets (cost=0.29..23430.20 rows=50349 width=64) (actual time=17.221..52.110 rows=51533 loops=1) > > However, when I index the closed column, a bitmap scan is used instead of > an index scan, with slightly slower performance. Why isn't an index scan > being used, given that the exact same number of rows are at play as in my > query on the state column? How do I index closed in a way where an index > scan is used? > > CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL; > EXPLAIN ANALYZE select title, created, closed, updated from tickets where closed IS NULL; > Bitmap Heap Scan on tickets (cost=824.62..33955.85 rows=50349 width=64) (actual time=10.420..56.095 rows=51537 loops=1) > -> Bitmap Index Scan on closed_index (cost=0.00..812.03 rows=50349 width=0) (actual time=6.005..6.005 rows=51537 loops=1) Are you really concerned about 4ms ? If this is a toy-sized test system, please try on something resembling production, perhaps by loading production or fake data, or perhaps on a production system within a transactions (begin; CREATE INDEX CONCURRENTLY; explain ...; rollback). You can see that most of the estimated cost is from the table (the index scan accounts for only 812 of total 33955 cost units). So I'm guessing the planner thinks that an index scan will either 1) access the table randomly; and/or, 2) access a large fraction of the table. If it was just built, the first (partial/conditional/predicate/where) index will scan table in its "physical" order (if not sequentially). The 2nd index is going to scan table in order of ID, which I'm guessing is not "correlated" with its physical order, so an index scan cost is computed as accessing a larger fraction of the table (but by using an "bitmap" scan it's at least in physical order). In fact: 50349/17478 = ~3 tuples/page is low, so you're accessing a large fraction of the table to return a small fraction of its tuples. You can check what it thinks here: https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram You could try CLUSTERing the table on ID (which requires a non-partial index) and ANALYZEing (which might cause this and other queries to be planned and/or perform differently). That causes the table to be locked exclusively. Then, the planner knows that scanning index and returning results ordered by IDs (which doesn't matter) will also access table in physical order (which matters), and maybe fewer pages need to be read, too. Justin