Can't it just scan the index to get that? I assumed the index had
links to every fileid in the table. In my over-simplified imagination,
the table looks like this: ctid|fileid|column|column|column|column ctid|fileid|column|column|column|column ctid|fileid|column|column|column|column ctid|fileid|column|column|column|column etc. While the index looks like fileid|ctid fileid|ctid fileid|ctid fileid|ctid ... So I expected scanning the index was faster, and still had everything it needed to do the count. Or perhaps it was because I said COUNT(*) so it needs to look at the other columns in the table? I really just wanted the number of "hits" not the number of records with distinct values or anything like that. My understanding was that COUNT(*) did that, and didn't really look at the columns themselves. Adrian Klaver wrote: -------------- Original message ---------------------- From: William Garrison <postgres@xxxxxxxxxxxx>I am looking for records with duplicate keys, so I am running this query: SELECT fileid, COUNT(*) FROM file GROUP BY fileid HAVING COUNT(*)>1 The table has an index on fileid (non-unique index) so I am surprised that postgres is doing a table scan. This database is >15GB, and there are a number of fairly large string columns in the table. I am very surprised that scanning the index is not faster than scanning the table. Any thoughts on that? Is scanning the table faster than scanning the index? Is there a reason that it needs anything other than the index?I may be missing something, but it would have to scan the entire table to get all the occurrences of each fileid in order to do the count(*). -- Adrian Klaver aklaver@xxxxxxxxxxx |