Hi, I have a problem with the postgres planner, which gives a cost to index scan which is much higher than actual cost (worst case considered, e.g. without any previous disk cache), and am posting here for advices for tweaking cost constants. Because of this problem, the planner typically chooses a seq scan when an index scan would be more efficient, and I would like to correct this if possible. Reading the documentation and postgresql list archives, I have run ANALYZE right before my tests, I have increased the statistics target to 50 for the considered table; my problem is that the index scan cost reported by EXPLAIN seems to be around 12.7 times higher that it should, a figure I suppose incompatible (too large) for just random_page_cost and effective_cache_size tweaks. Structure of the table: \d sent_messages Table "public.sent_messages" Column | Type | Modifiers ----------+--------------------------+---------------------------------------------------------------- uid | integer | not null default nextval('public.sent_messages_uid_seq'::text) sender | character varying(25) | receiver | character varying(25) | action | character varying(25) | cost | integer | date | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone status | character varying(128) | theme | character varying(25) | operator | character varying(15) | Indexes: "sent_messages_pkey" primary key, btree (uid) "idx_sent_msgs_date_theme_status" btree (date, theme, status) What I did: - SET default_statistics_target = 50 - VACUUM FULL ANALYZE VERBOSE sent_messages - copied so that you can have a look at rows and pages taken up by relations INFO: vacuuming "public.sent_messages" INFO: "sent_messages": found 0 removable, 3692284 nonremovable row versions in 55207 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 103 to 177 bytes long. There were 150468 unused item pointers. Total free space (including removable row versions) is 2507320 bytes. 0 pages are or will become empty, including 0 at the end of the table. 2469 pages containing 262732 free bytes are potential move destinations. CPU 0.57s/0.20u sec elapsed 11.27 sec. INFO: index "sent_messages_pkey" now contains 3692284 row versions in 57473 pages DETAIL: 0 index row versions were removed. 318 index pages have been deleted, 318 are currently reusable. CPU 2.80s/1.27u sec elapsed 112.69 sec. INFO: index "idx_sent_msgs_date_theme_status" now contains 3692284 row versions in 88057 pages DETAIL: 0 index row versions were removed. 979 index pages have been deleted, 979 are currently reusable. CPU 4.22s/1.51u sec elapsed 246.88 sec. INFO: "sent_messages": moved 0 row versions, truncated 55207 to 55207 pages DETAIL: CPU 1.87s/3.18u sec elapsed 42.71 sec. INFO: vacuuming "pg_toast.pg_toast_77852470" INFO: "pg_toast_77852470": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 0 to 0 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 0 bytes. 0 pages are or will become empty, including 0 at the end of the table. 0 pages containing 0 free bytes are potential move destinations. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_77852470_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.02 sec. INFO: analyzing "public.sent_messages" INFO: "sent_messages": 55207 pages, 15000 rows sampled, 3666236 estimated total rows - select rows of the table with a range condition on "date", find a range for which seq scan and index scan runtimes seem to be very close (I use Linux, I cat a 2G file to /dev/null between each request to flush disk cache, on a machine of 1G real RAM and 1G of swap, so that this is the worst case tested for index scan), notice that the cost used by the planner is 12.67 times higher for index scan, at a position it should be around 1 so that planner could make sensible choices: EXPLAIN ANALYZE SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on sent_messages (cost=0.00..110591.26 rows=392066 width=78) (actual time=7513.205..13095.147 rows=393074 loops=1) Filter: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestamp with time zone)) Total runtime: 14272.522 ms SET enable_seqscan = false EXPLAIN ANALYZE SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_sent_msgs_date_theme_status on sent_messages (cost=0.00..1402124.26 rows=392066 width=78) (actual time=142.638..12677.378 rows=393074 loops=1) Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestamp with time zone)) Total runtime: 13846.504 ms Please notice that an index on the "date" column only would be much more efficient for the considered request (and I have confirmed this by creating and trying it), but I don't necessarily would need this index if the existing index was used. Of course real queries use smaller date ranges. - I then tried to tweak random_page_cost and effective_cache_size following advices from documentation: SET random_page_cost = 2; SET effective_cache_size = 10000; EXPLAIN SELECT * FROM sent_messages WHERE date > '2005-09-01' AND date < '2005-09-19'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_sent_msgs_date_theme_status on sent_messages (cost=0.00..595894.94 rows=392066 width=78) Index Cond: ((date > '2005-09-01 00:00:00+00'::timestamp with time zone) AND (date < '2005-09-19 00:00:00+00'::timestamp with time zone)) We can see that estimated index scan cost goes down but by a factor of approx. 2.3 which is far from enough to "fix" it. I am reluctant in changing way more the random_page_cost and effective_cache_size values as I'm suspecting it might have other (bad) consequences if it is too far away from reality (even if Linux is known to aggressively cache), the application being multithreaded (there is a warning about concurrent queries using different indexes in documentation). But I certainly could benefit from others' experience on this matter. I apologize for this long email but I wanted to be sure I gave enough information on the data and things I have tried to fix the problem myself. If anyone can see what I am doing wrong, I would be very interested in pointers. Thanks in advance! Btw, I use postgres 7.4.5 with -B 1000 -N 500 and all postgresql.conf default values except timezone = 'UTC', on an ext3 partition with data=ordered, and run Linux 2.6.12. -- Guillaume Cottenceau