You could try writing a plpgsql function which would generate the data set. Or you could use your existing data set.
Creating 3.5 mln rows using stored proc is probably slow. Probably it would be better and faster to use some random() and generate_series() trick. In this case others can try it and dataset generation is faster.
By the way, a simple way to de-bloat your big table without blocking would be this : - stop all inserts and updates - begin - create table new like old table - insert into new select * from old (order by perhaps) - create indexes - rename new into old - commit If this is just a reporting database where you insert a batch of new data every day, for instance, that's very easy to do. If it's OLTP, then, no.
Those are orders and order_products tables. I ran vacuum full analyze verbose last night. Now database has 4832 MB size, including 1 GB pg_shdepend bloated indexes. I added max_fsm_pages=150000 and re-booted. Query below and other queries are still too slow set search_path to firma2,public; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=181795.13..181795.14 rows=1 width=0) (actual time=23678.265..23678.268 rows=1 loops=1)" " -> Nested Loop (cost=73999.44..181733.74 rows=24555 width=0) (actual time=18459.230..23598.956 rows=21476 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.134..0.145 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=73999.44..181482.18 rows=24555 width=24) (actual time=18459.076..23441.098 rows=21476 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4082.88..101779.03 rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634 rows=278183 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=69195.13..69195.13 rows=112573 width=4) (actual time=8894.465..8894.465 rows=109890 loops=1)" " -> Bitmap Heap Scan on dok (cost=1492.00..69195.13 rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177 rows=110484 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" "Total runtime: 23678.790 ms" Here is a list of untried recommendations from this thread: 1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx - In 8.1.4 provided form of CLUSTER causes syntax error, no idea what syntax to use. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 2. Change CHAR(20) product index to int index by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 3. Denormalization of sale date to order_producs table by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 4. Check on the performance of the RAID: Does it leverage NCQ appropriately when running queries in parallel ? No idea how. 5. Materialized views. I need date granularity so it is possible to sum only one days sales. http://www.pgcon.org/2008/schedule/events/69.en.html Seems to be major appl re-write, no idea how. Appoaches which probably does not change perfomance: 6. Upgrade to 8.4 or to 8.3.5 7. run server on standalone mode and recover 1 GB pg_shdepend bloated index. 8. tune some conf file parameters:
work_mem = 512
I'd consider increasing this value a little - 0.5 MB seems too low to me (but not necessarily).
effective_cache_size= 70000
Well, your server has 2GB of RAM and usually it's recommended to set this value to about 60-70% of your RAM, so using 540MB (25%) seems quite low. Data size is nearly the same as RAM size. It is unpleasant surprise that queries take so long time. What should I do next? Andrus. 1 40926 firma2.rid 1737 MB 2 40595 firma2.dok 1632 MB 3 1214 pg_catalog.pg_shdepend 1235 MB 4 1232 pg_catalog.pg_shdepend_depender_index 795 MB 7 1233 pg_catalog.pg_shdepend_reference_index 439 MB 8 44299 firma2.rid_toode_idx 298 MB 9 44286 firma2.dok_tasudok_idx 245 MB 10 19103791 firma2.rid_toode_pattern_idx 202 MB 11 44283 firma2.dok_klient_idx 160 MB 12 44298 firma2.rid_inpdokumnr_idx 148 MB 13 44297 firma2.rid_dokumnr_idx 132 MB 14 43573 firma2.rid_pkey 130 MB 17 40556 pg_toast.pg_toast_40552 112 MB 18 44288 firma2.dok_tasumata_idx 103 MB 19 44289 firma2.dok_tellimus_idx 101 MB 20 44284 firma2.dok_krdokumnr_idx 101 MB 21 44285 firma2.dok_kuupaev_idx 94 MB 22 19076304 firma2.rid_rtellimus_idx 90 MB 24 44282 firma2.dok_dokumnr_idx 74 MB 25 43479 firma2.dok_pkey 74 MB 26 18663923 firma2.dok_yksus_pattern_idx 65 MB 27 18801591 firma2.dok_sihtyksus_pattern_idx 64 MB 32 18774881 firma2.dok_doktyyp 47 MB output from vacuum full: INFO: free space map contains 14353 pages in 314 relations DETAIL: A total of 20000 page slots are in use (including overhead). 89664 page slots are required to track all free space. Current limits are: 20000 page slots, 1000 relations, using 182 KB. NOTICE: number of page slots needed (89664) exceeds max_fsm_pages (20000) HINT: Consider increasing the configuration parameter "max_fsm_pages" to a value over 89664. Query returned successfully with no result in 10513335 ms. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance