On 04/06/14 22:57, Merlin Moncure wrote: > On Wed, Jun 4, 2014 at 2:58 PM, Linos <info@xxxxxxxx> wrote: >> On 04/06/14 21:36, Merlin Moncure wrote: >>> On Wed, Jun 4, 2014 at 8:56 AM, Linos <info@xxxxxxxx> wrote: >>>> Hello, >>>> >>>> Some days ago I upgraded from 8.4 to 9.3, after the upgrade some queries started performing a lot slower, the query I am using in this example is pasted here: >>>> >>>> http://pastebin.com/71DjEC21 >>>> >>>> >>>> Considering it is a production database users are complaining because queries are much slower than before, so I tried to downgrade to 9.2 with the same result as 9.3, I finally restored the database on 8.4 and the query is as fast as before. >>>> >>>> All this tests are done on Debian Squeeze with 2.6.32-5-amd64 kernel version, the hardware is Intel Xeon E5520, 32Gb ECC RAM, the storage is software RAID 10 with 4 SEAGATE ST3146356SS SAS drives. >>>> >>>> postgresql.conf: >>>> max_connections = 250 >>>> shared_buffers = 6144MB >>>> temp_buffers = 8MB >>>> max_prepared_transactions = 0 >>>> work_mem = 24MB >>>> maintenance_work_mem = 384MB >>>> max_stack_depth = 7MB >>>> default_statistics_target = 150 >>>> effective_cache_size = 24576MB >>>> >>>> >>>> 9.3 explain: >>>> http://explain.depesz.com/s/jP7o >>>> >>>> 9.3 explain analyze: >>>> http://explain.depesz.com/s/6UQT >>>> >>>> 9.2 explain: >>>> http://explain.depesz.com/s/EW1g >>>> >>>> 8.4 explain: >>>> http://explain.depesz.com/s/iAba >>>> >>>> 8.4 explain analyze: >>>> http://explain.depesz.com/s/MPt >>>> >>>> It seems to me that the total estimated cost went too high in 9.2 and 9.3 but I am not sure why, I tried commenting out part of the query and disabling indexonlyscan but still I have very bad timings and estimates. >>>> >>>> The dump file is the same for all versions and after the restore process ended I did vacuum analyze on the restored database in all versions. >>>> http://www.postgresql.org/mailpref/pgsql-performance >>> The rowcount estimates are garbage on all versions so a good execution >>> plan can be chalked up to chance. That being said, it seems like >>> we're getting an awful lot of regressions of this type with recent >>> versions. >>> >>> Can you try re-running this query with enable_nestloop and/or >>> enable_material disabled? (you can disable them for a particular >>> session via: set enable_material = false;) . This is a "ghetto fix" >>> but worth trying. If it was me, I'd be simplifying and optimizing the >>> query. >>> >>> merlin >>> >>> >> Much better with this options set to false, thank you Merlin, even better than 8.4 >> >> 9.3 explain analyze with enable_nestloop and enable_material set to false. >> http://explain.depesz.com/s/94D >> >> The thing is I have plenty of queries that are now a lot slower than before, this is only one example. I would like to find a fix or workaround. >> >> I can downgrade to 9.1, I didn't try on 9.1 but it's the first version that supports exceptions inside plpython and I would like to use them. Do you think this situation would be better on 9.1? >> >> Or maybe can I disable material and nestloop on postgresql.conf? I thought was bad to trick the planner but given this strange behavior I am not sure anymore. >> > I would against advise adjusting postgresql.conf. nestloops often > give worse plans than other choices but can often give the best plan, > sometimes by an order of magnitude or more. planner directives should > be considered a 'last resort' fix and should generally not be changed > in postgresql.conf. If i were in your shoes, I'd be breaking the > query down and figuring out where it goes off the rails. Best case > scenario, you have a simplified, test case reproducible reduction of > the problem that can help direct changes to the planner. In lieu of > that, I'd look at this as a special case optimization of problem > queries. > > There is something else to try. Can you (temporarily) raise > join_collapse_limit higher (to, say 20), and see if you get a better > plan (with and without other planner adjustments)? > > merlin > > This is the plan with join_collapse_limit=20, enable_nestloop=false, enable_material=false: http://explain.depesz.com/s/PpL The plan with join_collapse_limit=20 but nestloops and enable_material true is taking too much time, seems to have the same problem as with join_collapse_limit=8. I will try to create a simpler reproducible example, thank you. Regards, Miguel Angel.