Re: Possible performance regression in PostgreSQL 9.2/9.3?

Linos <info@xxxxxxxx> · Thu, 05 Jun 2014 00:09:45 +0200

On 04/06/14 22:57, Merlin Moncure wrote:
> On Wed, Jun 4, 2014 at 2:58 PM, Linos <info@xxxxxxxx> wrote:
>> On 04/06/14 21:36, Merlin Moncure wrote:
>>> On Wed, Jun 4, 2014 at 8:56 AM, Linos <info@xxxxxxxx> wrote:
>>>> Hello,
>>>>
>>>> Some days ago I upgraded from 8.4 to 9.3, after the upgrade some queries started performing a lot slower, the query I am using in this example is pasted here:
>>>>
>>>> http://pastebin.com/71DjEC21
>>>>
>>>>
>>>> Considering it is a production database users are complaining because queries are much slower than before, so I tried to downgrade to 9.2 with the same result as 9.3, I finally restored the database on 8.4 and the query is as fast as before.
>>>>
>>>> All this tests are done on Debian Squeeze with 2.6.32-5-amd64 kernel version, the hardware is Intel Xeon E5520, 32Gb ECC RAM, the storage is software RAID 10 with 4 SEAGATE ST3146356SS SAS drives.
>>>>
>>>> postgresql.conf:
>>>> max_connections = 250
>>>> shared_buffers = 6144MB
>>>> temp_buffers = 8MB
>>>> max_prepared_transactions = 0
>>>> work_mem = 24MB
>>>> maintenance_work_mem = 384MB
>>>> max_stack_depth = 7MB
>>>> default_statistics_target = 150
>>>> effective_cache_size = 24576MB
>>>>
>>>>
>>>> 9.3 explain:
>>>> http://explain.depesz.com/s/jP7o
>>>>
>>>> 9.3 explain analyze:
>>>> http://explain.depesz.com/s/6UQT
>>>>
>>>> 9.2 explain:
>>>> http://explain.depesz.com/s/EW1g
>>>>
>>>> 8.4 explain:
>>>> http://explain.depesz.com/s/iAba
>>>>
>>>> 8.4 explain analyze:
>>>> http://explain.depesz.com/s/MPt
>>>>
>>>> It seems to me that the total estimated cost went too high in 9.2 and 9.3 but I am not sure why, I tried commenting out part of the query and disabling indexonlyscan but still I have very bad timings and estimates.
>>>>
>>>> The dump file is the same for all versions and after the restore process ended I did vacuum analyze on the restored database in all versions.
>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>> The rowcount estimates are garbage on all versions so a good execution
>>> plan can be chalked up to chance.  That being said, it seems like
>>> we're getting an awful lot of regressions of this type with recent
>>> versions.
>>>
>>> Can you try re-running this query with enable_nestloop and/or
>>> enable_material disabled? (you can disable them for a particular
>>> session via: set enable_material = false;) .   This is a "ghetto fix"
>>> but worth trying.  If it was me, I'd be simplifying and optimizing the
>>> query.
>>>
>>> merlin
>>>
>>>
>> Much better with this options set to false, thank you Merlin, even better than 8.4
>>
>> 9.3 explain analyze with enable_nestloop and enable_material set to false.
>> http://explain.depesz.com/s/94D
>>
>> The thing is I have plenty of queries that are now a lot slower than before, this is only one example. I would like to find a fix or workaround.
>>
>> I can downgrade to 9.1, I didn't try on 9.1 but it's the first version that supports exceptions inside plpython and I would like to use them. Do you think this situation would be better on 9.1?
>>
>> Or maybe can I disable material and nestloop on postgresql.conf? I thought was bad to trick the planner but given this strange behavior I am not sure anymore.
>>
> I would against advise adjusting postgresql.conf.  nestloops often
> give worse plans than other choices but can often give the best plan,
> sometimes by an order of magnitude or more.  planner directives should
> be considered a 'last resort' fix and should generally not be changed
> in postgresql.conf.  If i were in your shoes, I'd be breaking the
> query down and figuring out where it goes off the rails.   Best case
> scenario, you have a simplified, test case reproducible reduction of
> the problem that can help direct changes to the planner.  In lieu of
> that, I'd look at this as a special case optimization of problem
> queries.
>
> There is something else to try.  Can you (temporarily) raise
> join_collapse_limit higher (to, say 20), and see if you get a better
> plan (with and without other planner adjustments)?
>
> merlin
>
>

This is the plan with join_collapse_limit=20, enable_nestloop=false, enable_material=false:
http://explain.depesz.com/s/PpL

The plan with join_collapse_limit=20 but nestloops and enable_material true is taking too much time, seems to have the same problem as with join_collapse_limit=8.

I will try to create a simpler reproducible example, thank you.

Regards,
Miguel Angel.