Huan Ruan wrote: > Hash 1st run > "Hash Join (cost=1681.87..6414169.04 rows=48261 width=171) > (actual time=2182.450..88158.645 rows=48257 loops=1)" > " -> Seq Scan on invtran bigtable (cost=0.00..4730787.28 > rows=168121728 width=108) (actual time=0.051..32581.052 > rows=168121657 loops=1)" 194 nanoseconds per row suggests 100% cache hits. > NL 1st run > "Nested Loop (cost=0.00..6451637.88 rows=48261 width=171) (actual > time=0.056..551.438 rows=48257 loops=1)" > " -> Index Scan using pk_invtran on invtran bigtable > (cost=0.00..133.65 rows=1 width=108) (actual time=0.010..0.010 > rows=1 loops=48261)" 10 microseconds per index scan (each index scan requiring multiple "random" accesses) also suggests 100% cache hits. > I originally reduced random_page_cost to 2 to achieve the nested > loop join. Now I set cpu_tuple_cost to 0.05 and reset > random_page_cost back to 4, I can also achieve a nested loop > join. > > I'm still new in Postgres, but I'm worried about random_page_cost > being 2 is too low, so maybe increasing cpu_tuple_cost is a > better choice. If these are typical of what you would expect in production, then the fact that with default cost factors the costs are barely different (by 0.6%) for actual run times which differ by two orders of magnitude (the chosen plan is 160 times slower) means that the modeling of cost factors is off by a lot. If you expect the active portion of your database to be fully cached like this, it makes sense to reduce random_page_cost to be equal to seq_page_cost. But that only adjusts the costs by at most a factor of four, and we've established that in the above query they're off by a factor of 160. To help make up the difference, it makes sense to de-emphasize page access compared to cpu-related costs by reducing both page costs to 0.1. Combined, these adjustments still can't compensate for how far off the estimate was. In my experience default cpu_tuple_cost is understated compared to other cpu-related costs, so I would do the above *plus* a boost to cpu_tuple_cost. Personally, I have never seen a difference between plans chosen with that set to 0.03 and 0.05, so I can't say where in that range is the ideal value; you should feel free to experiment if there is a query which seems to be choosing a bad plan. If the above results really do represent cache hit levels you expect in production, the combination of the above changes should come reasonably close to modeling costs realistically, resulting in better plan choice. If you don't expect such high cache hit ratios in production, you probably don't want to go so low with page costs. >>> - shared_buffers = 6GB >>> - effective_cache_size = 18GB >>> - work_mem = 10MB >>> - maintenance_work_mem = 3GB > Can you see any obvious issues with the other memory settings I > changed? I might bump up work_mem to 20MB to 60MB, as long as you're not going crazy with max_connections. I would probably take maintenance_work_mem down to 1GB to 2GB -- you can have several of these allocations at one time, and you don't want to blow away your cache. (I think it might actually be adjusted down to 2GB internally anyway; but I would need to check.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance