>A single VACUUM may not report how bloated your table is, if it's >been VACUUM'ed some before, but not frequently enough. If you >have time for it, and you can afford a full lock on the table, >only a VACUUM FULL VERBOSE will tell you the previous bloat (the >"table .. truncated to .." line IIRC). Here's the output of vacuum full verbose (after running a plain vacuum verbose) INFO: vacuuming "public.jbpm_taskinstance" INFO: "jbpm_taskinstance": found 0 removable, 7555748 nonremovable row versions in 166156 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 88 to 209 bytes long. There were 8470471 unused item pointers. Total free space (including removable row versions) is 208149116 bytes. 9445 pages are or will become empty, including 0 at the end of the table. 119104 pages containing 206008504 free bytes are potential move destinations. CPU 2.44s/1.60u sec elapsed 127.89 sec. INFO: index "jbpm_taskinstance_pkey" now contains 7555748 row versions in 62090 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.87s/2.16u sec elapsed 120.81 sec. INFO: index "idx_tskinst_tminst" now contains 7555748 row versions in 65767 pages DETAIL: 0 index row versions were removed. 26024 index pages have been deleted, 26024 are currently reusable. CPU 0.79s/1.95u sec elapsed 103.52 sec. INFO: index "idx_tskinst_slinst" now contains 7555748 row versions in 56031 pages DETAIL: 0 index row versions were removed. 28343 index pages have been deleted, 28343 are currently reusable. CPU 0.62s/1.93u sec elapsed 99.21 sec. INFO: index "idx_taskinst_tokn" now contains 7555748 row versions in 65758 pages DETAIL: 0 index row versions were removed. 26012 index pages have been deleted, 26012 are currently reusable. CPU 1.10s/2.18u sec elapsed 108.29 sec. INFO: index "idx_taskinst_tsk" now contains 7555748 row versions in 64516 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.01s/1.73u sec elapsed 64.73 sec. INFO: index "idx_pooled_actor" now contains 7555748 row versions in 20896 pages DETAIL: 0 index row versions were removed. 136 index pages have been deleted, 136 are currently reusable. CPU 0.26s/1.57u sec elapsed 3.01 sec. INFO: index "idx_task_actorid" now contains 7555748 row versions in 20885 pages DETAIL: 0 index row versions were removed. 121 index pages have been deleted, 121 are currently reusable. CPU 0.23s/1.52u sec elapsed 2.77 sec. INFO: "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to 140279 pages DETAIL: CPU 26.50s/138.35u sec elapsed 735.02 sec. INFO: index "jbpm_taskinstance_pkey" now contains 7555748 row versions in 62090 pages DETAIL: 1374243 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.04s/1.38u sec elapsed 117.72 sec. INFO: index "idx_tskinst_tminst" now contains 7555748 row versions in 65767 pages DETAIL: 1374243 index row versions were removed. 26024 index pages have been deleted, 26024 are currently reusable. CPU 1.37s/1.01u sec elapsed 123.56 sec. INFO: index "idx_tskinst_slinst" now contains 7555748 row versions in 56031 pages DETAIL: 1374243 index row versions were removed. 28560 index pages have been deleted, 28560 are currently reusable. CPU 1.20s/1.27u sec elapsed 105.67 sec. INFO: index "idx_taskinst_tokn" now contains 7555748 row versions in 65758 pages DETAIL: 1374243 index row versions were removed. 26012 index pages have been deleted, 26012 are currently reusable. CPU 1.29s/0.96u sec elapsed 112.62 sec. INFO: index "idx_taskinst_tsk" now contains 7555748 row versions in 64516 pages DETAIL: 1374243 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.48s/1.12u sec elapsed 70.56 sec. INFO: index "idx_pooled_actor" now contains 7555748 row versions in 25534 pages DETAIL: 1374243 index row versions were removed. 3769 index pages have been deleted, 3769 are currently reusable. CPU 0.48s/0.82u sec elapsed 6.89 sec. INFO: index "idx_task_actorid" now contains 7555748 row versions in 25545 pages DETAIL: 1374243 index row versions were removed. 3790 index pages have been deleted, 3790 are currently reusable. CPU 0.37s/1.24u sec elapsed 7.93 sec. INFO: vacuuming "pg_toast.pg_toast_560501" INFO: "pg_toast_560501": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 0 to 0 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 0 bytes. 0 pages are or will become empty, including 0 at the end of the table. 0 pages containing 0 free bytes are potential move destinations. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_560501_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. >Can you show us the table definition? I am too ignorant in PG to >know if that would make a difference, but it might ring a bell >for others.. AFAIK, more column data may mean larger resultsets >and may change the plan (though "width=128" in the log of your >explanation wouldn't mean a lot of data are stored per row). Yep, the table is from the jboss jbpm (business process management) schema. It has one modification, I've added the pooledactor_ column (column B as I've been referring to it until now) to remove a many-to-many relationship to simplify my query. Column A as I've been referring to is the actorid_ column. Here's the ddl: CREATE TABLE jbpm_taskinstance ( id_ bigint NOT NULL, class_ character(1) NOT NULL, version_ integer NOT NULL, name_ character varying(255), description_ character varying(4000), actorid_ character varying(255), create_ timestamp without time zone, start_ timestamp without time zone, end_ timestamp without time zone, duedate_ timestamp without time zone, priority_ integer, iscancelled_ boolean, issuspended_ boolean, isopen_ boolean, issignalling_ boolean, isblocking_ boolean, task_ bigint, token_ bigint, procinst_ bigint, swimlaninstance_ bigint, taskmgmtinstance_ bigint, pooledactor_ character varying(255), processname_ character varying(255), CONSTRAINT jbpm_taskinstance_pkey PRIMARY KEY (id_), CONSTRAINT fk_taskinst_slinst FOREIGN KEY (swimlaninstance_) REFERENCES jbpm_swimlaneinstance (id_) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT fk_taskinst_task FOREIGN KEY (task_) REFERENCES jbpm_task (id_) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT fk_taskinst_tminst FOREIGN KEY (taskmgmtinstance_) REFERENCES jbpm_moduleinstance (id_) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT fk_taskinst_token FOREIGN KEY (token_) REFERENCES jbpm_token (id_) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT fk_tskins_prcins FOREIGN KEY (procinst_) REFERENCES jbpm_processinstance (id_) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH (OIDS=FALSE); -- Index: idx_pooled_actor -- DROP INDEX idx_pooled_actor; CREATE INDEX idx_pooled_actor ON jbpm_taskinstance USING btree (pooledactor_); -- Index: idx_taskinst_tokn -- DROP INDEX idx_taskinst_tokn; CREATE INDEX idx_taskinst_tokn ON jbpm_taskinstance USING btree (token_); -- Index: idx_taskinst_tsk -- DROP INDEX idx_taskinst_tsk; CREATE INDEX idx_taskinst_tsk ON jbpm_taskinstance USING btree (task_, procinst_); -- Index: idx_tskinst_slinst -- DROP INDEX idx_tskinst_slinst; CREATE INDEX idx_tskinst_slinst ON jbpm_taskinstance USING btree (swimlaninstance_); -- Index: idx_tskinst_tminst -- DROP INDEX idx_tskinst_tminst; CREATE INDEX idx_tskinst_tminst ON jbpm_taskinstance USING btree (taskmgmtinstance_); >Btw, it would help if you could reproduce my test scenario and >see if PG uses "correctly" the indexscan. It is better to try on >your installation, to take care of any configuration/whatever >variation which may create your problem. I have tried your example and I get the same results as you. db=# explain select * from foo where baz is null and bar = '8' limit 15; QUERY PLAN ---------------------------------------------------------------------------- ---- --- Limit (cost=0.00..0.53 rows=15 width=154) -> Index Scan using foobar on foo (cost=0.00..33159.59 rows=934389 width=15 4) Index Cond: (bar = 8) Filter: (baz IS NULL) (4 rows) db=# drop index foobar; DROP INDEX db=# explain select * from foo where baz is null and bar = '8' limit 15; QUERY PLAN --------------------------------------------------------------------- Limit (cost=0.00..2.87 rows=15 width=154) -> Seq Scan on foo (cost=0.00..178593.35 rows=934389 width=154) Filter: ((baz IS NULL) AND (bar = 8)) (3 rows) It's choosing the index because of a cost of 0.53 vs a cost of 2.87 for sequential scan. I wonder why in my real tables the index scan cost is higher than the sequential scan cost. Perhaps because of the extra width of my rows? >> From looking at the plans, it seems to be postgres is assuming it will >> only >> have to sequentially scan 15 rows, which is not true in my case >> because column B is not distributed randomly (nor will it be in >> production). Would > >Why do you say that? The explanation seems to rather tell that it >(correctly) assumes that the seqscan would bring up about 1M rows for the selected values of A and B, and then it will limit to 15 rows. I say that because the plan gives a really really low number (3.21) for the estimated cost after the limit on sequential scan: Select * from JBPM_TASKINSTANCE this_ where actorid_ is null and this_.POOLEDACTOR_ in ('21') limit 15 "Limit (cost=0.00..3.21 rows=15 width=128) (actual time=84133.211..84187.247 rows=15 loops=1)" " -> Seq Scan on jbpm_taskinstance this_ (cost=0.00..234725.85 rows=1095365 width=128) (actual time=84133.205..84187.186 rows=15 loops=1)" " Filter: ((actorid_ IS NULL) AND ((pooledactor_)::text = '21'::text))" "Total runtime: 84187.335 ms" It just seems to me it is not taking into account at all that it might have to scan thousands or millions of rows before it gets the 15 rows it needs. If I disable sequence scan, it equally uselessly chooses the wrong index, based on a far too low estimated cost of 5.27: Set enable_seqscan=false "Limit (cost=0.00..5.27 rows=15 width=128) (actual time=120183.610..120183.707 rows=15 loops=1)" " -> Index Scan using idx_task_actorid on jbpm_taskinstance this_ (cost=0.00..384657.95 rows=1095365 width=128) (actual time=120183.604..120183.653 rows=15 loops=1)" " Index Cond: (actorid_ IS NULL)" " Filter: ((pooledactor_)::text = '21'::text)" "Total runtime: 120183.788 ms" Many thanks David