Tom, Thanks for the advice. I realize that I have little understanding of index usage in PostgreSQL - I'm doing my best to improve this. Below is another comparison of the 'distinct' and 'group by' queries from the same table with seqscan set to on and off. I does look like the group by works better (with seqscan off), as you suggested. I'll try some more tinkering to see what I can make happen. However, I guess what I'm really trying to do in the context that I'm currently working is summarize the unique values stored in the index, rather than querying the table itself. Is this possible, or reasonable to do? Thanks again for your help, Mike (P.S. - Sorry if you get this twice Tom) ======================================================================= dbname=# set session enable_seqscan to on; SET dbname=# explain analyze select distinct year from tbl_ind_schools_edu; QUERY PLAN ----------------------------------------------------------------------- Unique (cost=32302.16..32579.31 rows=2 width=2) (actual time=2871.115..3705.652 rows=2 loops=1) -> Sort (cost=32302.16..32440.74 rows=55431 width=2) (actual time=2871.105..3268.114 rows=55431 loops=1) Sort Key: "year" -> Seq Scan on tbl_ind_schools_edu (cost=0.00..27485.31 rows=55431 width=2) (actual time=0.091..1903.820 rows=55431 loops=1) Total runtime: 3707.879 ms (5 rows) dbname=# set session enable_seqscan to off; SET dbname=# explain analyze select distinct year from tbl_ind_schools_edu; QUERY PLAN ----------------------------------------------------------------------- Unique (cost=0.00..161575.24 rows=2 width=2) (actual time=0.312..2143.846 rows=2 loops=1) -> Index Scan using schoolse_school_year on tbl_ind_schools_edu (cost=0.00..161436.67 rows=55431 width=2) (actual time=0.286..1717.445 rows=55431 loops=1) Total runtime: 2144.100 ms (3 rows) dbname=# set session enable_seqscan to on; SET dbname=# explain analyze select year from tbl_ind_schools_edu group by year; QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost=27623.89..27623.91 rows=2 width=2) (actual time=2176.003..2176.010 rows=2 loops=1) -> Seq Scan on tbl_ind_schools_edu (cost=0.00..27485.31 rows=55431 width=2) (actual time=0.072..1697.776 rows=55431 loops=1) Total runtime: 2254.643 ms (3 rows) dbname=# set session enable_seqscan to off; SET dbname=# explain analyze select year from tbl_ind_schools_edu group by year; QUERY PLAN ----------------------------------------------------------------------- Group (cost=0.00..161575.24 rows=2 width=2) (actual time=0.350..2128.425 rows=2 loops=1) -> Index Scan using schoolse_school_year on tbl_ind_schools_edu (cost=0.00..161436.67 rows=55431 width=2) (actual time=0.296..1689.331 rows=55431 loops=1) Total runtime: 2129.799 ms (3 rows) Tom Lane wrote: > Mike Leahy <mgleahy@xxxxxxxxxxxxxxxxxxx> writes: >> ... When I try to get the distinct number of >> years from these tables, it does a sequential scan to get two unique >> values from the "year" column in the *_edu table, but it uses an index >> scan to get a single unique value from the "year" column from the *_con >> table. In both cases, I would have expected the index scan to be used. > > You have a fundamental misunderstanding of what's going on here. Both > plans fetch the entire table contents. The difference is how the data > is brought into sorted order for the UNIQUE step --- either by an > explicit sort, or by scanning the table in index order. > > A full-table index scan is usually pretty darn inefficient (too much > random access) and so it's often the case that the sort approach is > actually faster. > > The two EXPLAINs you provided aren't compelling evidence of anything > wrong because they are for two different-sized tables ... but to the > extent that the results are comparable it appears that the planner is > actually biased in favor of the indexscan plan (cost divided by actual > time is way lower for the indexscan). > > What you should look at is performance of the two approaches on the > *same* table (fool with enable_sort and/or enable_indexscan to force > the alternative choices) and then see whether it makes sense to tweak > the planner cost parameters for your installation. > > Also I'd suggest trying > > select year from [table] group by year > > which is capable of using a hash aggregation approach; that will likely > beat either of these plans. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your > message can get through to the mailing list cleanly >