Marc Cousin wrote:
Temporarily I moved the problem at a bit higher sizes of batch by changing
random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an
apprentice sorcerer with this, as I told postgreSQL that fetching rows from
disk are much cheaper than they are. These values are, I think, completely
abnormal.
They certainly don't have anything to do with reality. Try putting them
back to (say) seq_page_cost=1 and random_page_cost=2.
So, finally, to my questions :
- Is it normal that PostgreSQL is this off base on these queries (sorry I
don't have the plans, if they are required I'll do my best to get some, but
they really are the two obvious plans for this kind of query). What could
make it choose the hash join for too small batch tables ?
No point in speculating without plans.
- Is changing the 2 costs the way to go ?
Not the way you have.
- Is there a way to tell postgreSQL that it's more costly to sort than it
thinks ? (instead of telling it that fetching data from disk doesn't cost
anything).
That's what the configuration settings do. But if you put a couple way
off from reality it'll be pure chance if it gets any estimates right.
Here are the other non-default values from my configuration :
shared_buffers = 2GB
work_mem = 64MB
Set this *much* higher when you are running your bulk imports. You can
do it per-connection. Try 256MB, 512MB, 1GB (but keep an eye on total
memory used).
maintenance_work_mem = 256MB
max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
effective_cache_size = 800MB
See other emails on this one.
default_statistics_target = 1000
Probably don't need this for all columns, but it won't cause problems
with these queries.
--
Richard Huxton
Archonet Ltd
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance