Because there is no MVCC information in the index. cug 2006/9/12, Piotr Kołaczkowski <P.Kolaczkowski@xxxxxxxxxxxxxx>:
On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote: > Laszlo Nagy wrote: > > I made another test. I create a file with the identifiers and names of > > the products: > > > > psql#\o products.txt > > psql#select id,name from product; > > > > Then I can search using grep: > > > > grep "Mug" products.txt | cut -f1 -d\| > > > > There is a huge difference. This command runs within 0.5 seconds. That > > is, at least 76 times faster than the seq scan. It is the same if I > > vacuum, backup and restore the database. I thought that the table is > > stored in one file, and the seq scan will be actually faster than > > grepping the file. Can you please tell me what am I doing wrong? I'm > > not sure if I can increase the performance of a seq scan by adjusting > > the values in postgresql.conf. I do not like the idea of exporting the > > product table periodically into a txt file, and search with grep. :-) > > Is there any other columns besides id and name in the table? How big is > products.txt compared to the heap file? > > > Another question: I have a btree index on product(name). It contains > > all product names and the identifiers of the products. Wouldn't it be > > easier to seq scan the index instead of seq scan the table? The index > > is only 66MB, the table is 1123MB. > > Probably, but PostgreSQL doesn't know how to do that. Even if it did, it > depends on how many matches there is. If you scan the index and then > fetch the matching rows from the heap, you're doing random I/O to the > heap. That becomes slower than scanning the heap sequentially if you're > going to get more than a few hits. Why match rows from the heap if ALL required data are in the index itself? Why look at the heap at all? This is the same performance problem in PostgreSQL I noticed when doing some "SELECT count(*)" queries. Look at this: explain analyze select count(*) from transakcja where data > '2005-09-09' and miesiac >= (9 + 2005 * 12) and kwota < 50; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=601557.86..601557.87 rows=1 width=0) (actual time=26733.479..26733.484 rows=1 loops=1) -> Bitmap Heap Scan on transakcja (cost=154878.00..596928.23 rows=1851852 width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1) Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision)) Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone) -> Bitmap Index Scan on idx_transakcja_miesiac_kwota (cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967 rows=1690402 loops=1) Index Cond: ((miesiac >= 24069) AND (kwota < 50::double precision)) Total runtime: 26733.980 ms (7 rows) The actual time retrieving tuples from the index is less than 10 seconds, but the system executes needless heap scan that takes up additional 16 seconds. Best regards, Peter ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
-- PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006 http://www.bignerdranch.com/news/2006-08-21.shtml