2011/10/3 Nowak Michał <michal.nowak@xxxxxx>: >> How many rows do you have in that table? > > a9-dev=> select count(*) from records; > count > --------- > 3620311 > (1 row) > > a9-dev=> select source_id, count(*) from records where source_id = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' or source_id = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' group by source_id; > source_id | count > --------------------------------------------------------+-------- > http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml | 427254 > http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml | 989184 > (2 rows) So the second one is roughly 27% of the table. I don't know the actual condition under which planner changes over the seqscan, but that value seems quite common it seems. The other thing planner is going to look at is the correlation, most common values, most common frequencies. In other words, if the value is 27% of all values, but is evenly spread across - I think planner will go for seq scan regardless. At the end of the day (afaik), index scan only pics pages for narrowed down seqscan really. So imagine if your index scan returned all the pages, you would still have to do a seqscan on all of them. Planner is trying to avoid that by weighting the costs of both operations. If it is too slow to run the current queries, you could try normalizing the table by splitting source_id into separate one and referencing it by an id. Very often what you'd find is that doing so lowers I/O required, hence saves a lot of time in queries. Downside is, that it is bit harder to add/update the tables. But that's where triggers and procedures come handy. > >> ALTER TABLE records ALTER id SET source_id 1000; vacuum analyze verbose records; > Did you mean ALTER TABLE records ALTER id SET STATISTICS 1000;? Yup, that's what I meant. Sorry. -- GJ -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance