Hi, I'd be grateful for any advice we can get... we recently switched from MySQL to PostgreSQL on the basis of some trials we carried out with datasets of varying sizes and varying rates of contention. For the most part we've been pleased with performance, but one particular application runs queries that pull back a lot of results across what is for us a large dataset. We've noticed enormous speed improvements over MySQL with complex queries, but some of these simpler queries are causing us some problems. We were hoping that the machine would be able to cache the majority of the database in memory and be able to run these kinds of queries very quickly. The box doesn't seem to be doing much I/O during these queries, and top usually reports the processor usage slowly increasing to about 75% but no higher than that (and then dropping once it's finished). We adjusted settings in postgresql.conf as recommended in various places on the web. In particular, experimentation led us to turn of enable_seq_scan, because it consistently led to faster query times, but we're not sure why that's the case or if it's a good idea generally. This example has been anonymised slightly, although I've checked it for typos. Our 'fact_table' has 6 million rows, each of which joins to one of 1.7 million rows in record_dimension, and one of 15,000 rows in 'date_dimension'. We have other tables that also join to 'fact_table', but for this example these three tables suffice. The total size (as reported on the file system, presumably including indexes) is 7.5GB. The query below pulls 12 months' worth of data (accounting for 20% of the rows in 'fact_table') with restrictions that account for 15% of the rows in 'record_dimension'. It's a read-only database (we dump it fresh nightly). The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2 logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres 8.3.5 built from source. It's got 400GB storage in RAID-5 (on 5 disks). It has 8GB of physical RAM. I'm able to use about 6GB of that for my own purposes; the server doesn't do much else but replicates a very low-usage mysql database. While it's running postgres only seems to use about 1.2GB of RAM. Postgres configuration is below the query and EXPLAIN. Any help would be much appreciated. ============= SELECT "record_dimension"."Id 1" AS "Id 1", "record_dimension"."Id 2" AS "fact_table"."Id 2", "Id 3" AS "Id 3" FROM "fact_table" INNER JOIN "record_dimension" ON "fact_table"."record_key" = "record_dimension"."record_key" INNER JOIN "date_dimension" ON "fact_table"."date_key" = "date_dimension"."date_key" WHERE "record_dimension"."Region" = 'Big Region' AND "date_dimension"."Month" BETWEEN '110' AND '121' AND "record_dimension"."A Common Property" AND "record_dimension"."Country" = 'USA'; ENABLE_SEQSCAN ON Nested Loop (cost=466.34..192962.24 rows=15329 width=12) (actual time=13653.238..31332.113 rows=131466 loops=1) -> Hash Join (cost=466.34..115767.54 rows=141718 width=8) (actual time=13649.952..19548.019 rows=1098344 loops=1) Hash Cond: (fact_table.date_key = date_dimension.date_key) -> Seq Scan on fact_table (cost=0.00..91589.38 rows=5945238 width=12) (actual time=0.014..8761.184 rows=5945238 loops=1) -> Hash (cost=461.99..461.99 rows=348 width=4) (actual time=4.651..4.651 rows=378 loops=1) -> Seq Scan on date_dimension (cost=0.00..461.99 rows=348 width=4) (actual time=0.044..4.007 rows=378 loops=1) Filter: (("Month" >= 110::smallint) AND ("Month" <= 121::smallint)) -> Index Scan using record_dimension_pkey on record_dimension (cost=0.00..0.53 rows=1 width=12) (actual time=0.007..0.007 rows=0 loops=1098344) Index Cond: (record_dimension.record_key = fact_table.record_key) Filter: (record_dimension."A Common Property" AND ((record_dimension."Region")::text = 'Big Region'::text) AND ((record_dimension."Country")::text = 'USA'::text)) Total runtime: 31522.166 ms (131466 rows) (Actual query time: 8606.963 ms) I/O during the query: +-----------------+-----------------------------------------+-----------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | | scans | tuples | heap_blks |cached | scans | tuples | idx_blks |cached| |-----------------+-------+--------+-----------+------------+-------+---------+----------+------+ |date_dimension | 1 | 14599 | 0 | 243 | 0 | 0 | 0 | 0 | |fact_table | 1 |5945238 | 0 |32137 | 0 | 0 | 0 | 0 | |record_dimension | 0 | 0 | 0 |1098344 |1098344 |1098344 | 0 |3300506 | ENABLE_SEQSCAN OFF Nested Loop (cost=0.00..355177.96 rows=15329 width=12) (actual time=14763.749..32483.625 rows=131466 loops=1) -> Merge Join (cost=0.00..277983.26 rows=141718 width=8) (actual time=14760.467..20623.975 rows=1098344 loops=1) Merge Cond: (date_dimension.date_key = fact_table.date_key) -> Index Scan using date_dimension_pkey on date_dimension (cost=0.00..706.23 rows=348 width=4) (actual time=0.074..1.635 rows=13 loops=1) Filter: (("Month" >= 110::smallint) AND ("Month" <= 121::smallint)) -> Index Scan using date_key on fact_table (cost=0.00..261696.89 rows=5945238 width=12) (actual time=0.016..9903.593 rows=5945238 loops=1) -> Index Scan using record_dimension_pkey on record_dimension (cost=0.00..0.53 rows=1 width=12) (actual time=0.007..0.007 rows=0 loops=1098344) Index Cond: (record_dimension.record_key = fact_table.record_key) Filter: (record_dimension."A Common Property" AND ((record_dimension."Region")::text = 'Big Region'::text) AND ((record_dimension."Country")::text = 'USA'::text)) Total runtime: 32672.995 ms (10 rows) (131466 rows) (Actual query time: 9049.854 ms) postgresql.conf ============= shared_buffers=1200MB work_mem = 100MB maintenance_work_mem = 200MB max_fsm_pages = 179200 fsync = off synchronous_commit = off full_page_writes = off enable_seqscan = off effective_cache_size = 2000MB default_statistics_target = 100 lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance