Re: Planner cost adjustments

Daniel Begin <jfd553@xxxxxxxxxxx> · Fri, 29 May 2015 16:56:14 -0400

Omg! I was not expecting such a step-by-step procedure, thanks!
I'll follow the guide :-)

Since I was about to provide a bit of context as asked by Tomas, here it is
for those who are interested...
Best regards,
Daniel

A bit of the required context...
I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
The PostgreSQL 9.3 cluster is spread over 3X3TB external drives with write
caching. Most tables are static (no insert).

My largest table looks like this...
Records composed of:  3 bigint, 2 boolean, 1 timestamp and 1 geography type.
Number of records: 3870130000 
Table size: 369GB 
Indexes size: 425GB
 - btree(primary key): 125GB
 - btree(another field): 86GB
 - gist(geography): 241GB

Overall, 40% of my table and 30% of indexes do not fit in cache
(effective_cache_size=10GB) but looking at mostly used tables and indexes,
more than 90% of what I use doesn't fit. 

On one hand, according to the documentation
(http://www.postgresql.org/docs/9.3/static/runtime-config-query.html), with
a cache rate like mine, I should probably increase random_page_cost to
better reflect the true cost of random storage reads.

On the other hand however, I found that...
(https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server) 
"This is not where you should start to search for plan problems. Thet
random_page_cost is pretty far down this list (at the end in fact). If you
are getting bad plans, this shouldn't be the first thing you look at, even
though lowering this value may be effective. Instead, you should start by
making sure autovacuum is working properly, that you are collecting enough
statistics, and that you have correctly sized the memory parameters for your
server--all the things gone over above. After you've done all those much
more important things, if you're still getting bad plans then you should see
if lowering random_page_cost is still useful." 

Please find below some the database config's parameters that might be of
interest...

Best regards,
Daniel

General config parameters I have modified
temp_buffers = 512MB
work_mem = 16MB
maintenance_work_mem = 256MB
checkpoint_segments = 64
checkpoint_completion_target = 0.8
effective_cache_size = 10GB
logging_collector = on
track_counts = on
autovacuum = on

Here are config parameters related to autovacuum 
 autovacuum_analyze_scale_factor = 0.1      
 autovacuum_analyze_threshold  = 50       
 autovacuum_freeze_max_age   = 200000000
 autovacuum_max_workers  = 3        
 autovacuum_multixact_freeze_max_age = 400000000
 autovacuum_naptime = 1min     
 autovacuum_vacuum_cost_delay  = 20ms     
 autovacuum_vacuum_cost_limit   = -1       
 autovacuum_vacuum_scale_factor = 0.2      
 autovacuum_vacuum_threshold = 50       
 vacuum_cost_delay  = 0         
vacuum_cost_limit = 200       
vacuum_cost_page_dirty = 20        
vacuum_cost_page_hit = 1         
vacuum_cost_page_miss = 10        
vacuum_defer_cleanup_age = 0         
vacuum_freeze_min_age = 50000000  
vacuum_freeze_table_age = 150000000 
vacuum_multixact_freeze_min_age = 5000000   
vacuum_multixact_freeze_table_age= 150000000

-----Original Message-----
From: PT [mailto:wmoran@xxxxxxxxxxxxxxxxx] 
Sent: May-29-15 16:35
To: Daniel Begin
Cc: pgsql-general@xxxxxxxxxxxxxx
Subject: Re:  Planner cost adjustments

On Fri, 29 May 2015 09:39:00 -0400
Daniel Begin <jfd553@xxxxxxxxxxx> wrote:

> Hi all,
> 
> Running some queries, I found that the planner often selects 
> sequential scan instead of an index scan, even if the latter is way 
> faster (one order of magnitude faster if I consider some tests I made 
> by setting enable_seqscan = ON/OFF). How can I figure out what 
> parameter I should adjust to get the planner select an appropriate plan
that would better consider my DB setup?
> 
> I had a look at
> http://www.postgresql.org/docs/9.3/static/runtime-config-query.html 
> but at this point it is too much information for me;-)  Any rules of 
> thumb, recipes I could use to select which parameters I should look at
first?

Here's the correct way to handle this:

1) As mentioned elsewhere, first take the time to ensure that your
   cost estimate settings are reasonable for your hardware.  See
   section 18.7.2:
   http://www.postgresql.org/docs/9.4/static/runtime-config-query.html
2) If #1 doesnt' fix it, don't change enable_seqscan. Run a bunch of
   tests on the query(s) to see how well it performs. Then do
   ANALYZE DATABASE $insert_name_here; and run all the tests again.
   If performance/planning improves, then the analyze settings on
   your server aren't aggressive enough. Make changes to related
   config settings to fix.
3) If #2 doesn't uncover the problem, run EXPLAIN ANALYZE on all the
   queries in your test. It takes a bit of understanding to do this
   step, so you'll want to read up a bit and possibly ask questions
   if you have trouble interpreting the output, but you're looking
   for discrepencies between the estimated and actual times for any
   particular table. If you find them, that tends to indicate that
   you'll need to update statistics targets on any tables with the
   problem. See:
   http://www.postgresql.org/docs/9.4/static/planner-stats.html
4) If #3 doesn't fix things, then the PostgreSQL developers want to
   know about your problem so they can improve the planner. First,
   if there are queries that are causing you problems, update the
   application to disable sequential scans _for_those_particular_
   _queries_ so your application continues to trundle along but
   don't disable sequential scans globally, as that may cause
   other queries to perform badly. Once that immediate problem is
   out of the way, put together a test case that demonstrates the
   problem you're having (but doesn't contain any proprietary
   data, etc) and post it to the list so the developers can figure
   out what to do to improve Postgres.

Hope this helps.

--
Bill Moran <wmoran@xxxxxxxxxxxxxxxxx>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general