On Tue, Nov 18, 2008 at 04:49:35PM +0000, Scara Maccai wrote: > if I got it right the reason some aggregates (such as COUNT) using > only index columns are "slow" on postgresql is that it uses MVCC, so > it has to read the data as well as the index. Every aggregate (of which COUNT is just one example) has to read data from both the index and the table. The reason is that each row in a table has two important identifiers; the transaction that created it and the transaction that killed it. Every time a query scans the table it looks to see that both the transaction that created it COMMITed and that transaction that killed it (if any) didn't COMMIT. The index doesn't contain these two identifiers so when scanning the index the code needs to go and check what these are. There are various optimizations in PG so that it doesn't need to actually check the transaction numbers the whole time, thus speeding things up a bit, but the semantics/behavior is the same. > It makes sense to me, > but I don't understand is how other databases (such as Oracle) do it. I believe Oracle maintains a separate log (not sure how it's structured) that contains this information and all the data in both the main table and index can be considered committed. There are tradeoffs in both directions; PG's implementation allows greater concurrency, but Oracle's way is more optimized for read access. Which implementation is better depends a lot on your work load. There has been talk of adding the transaction identifiers into the indexes in PG, which would mean that index scans wouldn't need to go to the table. The problem is that the indexes would be larger and modifying data would incur larger overheads as both the data and index would have to be updated. I hope someone will point out any mistakes I've made! Sam -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general