Marc Cousin <cousinmarc@xxxxxxxxx> wrote: > Then we insert missing paths. This is one of the plans that fail > insert into path (path) > select path from batch > where not exists > (select 1 from path where path.path=batch.path) > group by path; I know you said you wanted to focus on a different query, but this one can easily be optimized. Right now it is checking for an existing row in path for each row in batch; and it only needs to check once for each path. One way to write it would be: insert into path (path) select path from (select distinct path from batch) b where not exists (select 1 from path p where p.path = b.path); > So now we insert into the file table. > > insert into file (pathid,filename) > select pathid, filename from batch join path using (path); > What I think is the cause of the problem is that the planner > doesn't take into account that we are going to fetch the exact > same values all the time in the path table, so we'll have a very > good hit ratio. It kinda takes that into account for the index part of things via the effective_cache_size setting. That should normally be set to 50% to 75% of machine RAM. > Maybe the n_distinct from batch.path could be used to refine the > caching effect on the index scan ? Interesting idea. > For now, we work around this by using very low values for > seq_page_cost and random_page_cost for these 2 queries. If you are not already doing so, you might want to try setting cpu_tuple_cost to something in the 0.03 to 0.05 range. I have found that the default is too low compared to other cpu cost factors, and raising it makes the exact settings for page costs less sensitive -- that is, you get good plans over a wider range of page cost settings. I have sometimes been unable to get a good plan for a query without boosting this, regardless of what I do with other settings. Running with a development build on my 16GB development PC, I got your fast plan with your "big data" test case by making only this one adjustment from the postgresql.conf defaults: set effective_cache_size = '2GB'; I also got the fast plan if I left effective_cache_size at the default and only changed: set cpu_tuple_cost = 0.03; I know that there have been adjustments to cost calculations for use of large indexes in both minor and major releases. If a little sensible tuning of cost factors to better match reality doesn't do it for you, you might want to consider an upgrade. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance