Re: Postgres 9.6.1 big slowdown by upgrading 8.4.22

Alban Hertroys <haramrae@xxxxxxxxx> · Sun, 8 Jan 2017 01:12:13 +0100

> On 7 Jan 2017, at 15:44, Job <Job@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
> Hi guys,
> 
> First of all excuse me but i really do not explain the problem, sorry...
> 
>>> Are you being serious? You're complaining about a "big slowdown" for a query that goes from 1.5ms to 4ms?
>>> What is the actual problem you're trying to solve? Because I don't see one in the above.
> 
> Single query if fast both in 8.4.22 and 9.6.1, no problem.
> 
> But the problem is not here!
> 
> The big problem is the benchmark before put the system under production.
> We launch about 100/200 queries per second and we monitor with "top" the two machines.
> They are VM with 4 vCPU and 10Gb of RAM, with CentOS 7.2 64bit.
> 
> This is what it happens:
> 
> Postgres 8.4.22
> Medium average load 1.5/2.0
> Further queries respond very quickly
> 
> Postgres 9.6.1
> Medium average load 18.0/20.0 !!
> Further queries are really very slow
> There is a bottle neck

I see.

> By removing *only* this condition in the query function:
> "exists ( select 1 from gruorari where gruorari.idgrucate=grucategorie.id and ( (('{'||gg_sett||'}')::int[] && array[EXTRACT(DOW FROM NOW())::int])='t' and  now()::time between gruorari.dalle::time and gruorari.alle::time) )"

Then most likely the slow-down you're experiencing is indeed in the above subquery. It could also be the addition of the exists though, let's not rule that out!

Note that I'm not on either of the versions involved (9.3.15 here), so I can't easily observe what you're seeing.

A general observation; I think now() calls gettimeofday() each time, the performance of which can differ significantly depending on which hardware clock is being used by your OS (there are often multiple options). On the contrary, CURRENT_TIMESTAMP, CURRENT_TIME and friends are only updated at the start of the transaction, requiring but a single call to gettimeofday().
Judging from your queries, you don't actually seem to need the accuracy that NOW() provides…

The one-liner is a bit hard to read, btw - and so requires more effort from anyone trying to help.

A useful next step would be to benchmark the separate parts of that query:
1). where gruorari.idgrucate = grucategorie.id
2). and (('{'||gg_sett||'}')::int[] && array[EXTRACT(DOW FROM NOW())::int])='t'
3). and now()::time between gruorari.dalle::time and gruorari.alle::time
4). exists(select 1)

Ad 1). If there's any difference in performance between the 2 PG versions here, most likely it's a different plan for this condition. It might be as simple as a difference in statistics or number of rows.

Ad 2). This part seems a bit convoluted, which may be responsible for some of the overhead. Frankly, I'm not 100% certain of the purpose of that condition, but it appears that the gg_sett field contains a comma-separated list of days of the week that need to be matched to today's day of the week.
I rewrote it to:
	extract(DOW from NOW()) = any('{' || gg_sett || '}'::int[])

Performance of either query on my 9.3 installation is pretty much the same, but I have only done some synthetic benchmarks:

=> explain analyze select '{6,7,0}'::int[] && array[extract(DOW from NOW())::int] from generate_series(1, 10000);
                                                      QUERY PLAN                 
-----------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..20.00 rows=1000 width=0) (actual time=4.548..58.072 rows=10000 loops=1)
 Total runtime: 77.116 ms
(2 rows)

=> explain analyze select extract(DOW from NOW()) = any('{6,7,0}'::int[]) from generate_series(1, 10000);                                           
                                                      QUERY PLAN                 
-----------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..18.75 rows=1000 width=0) (actual time=4.341..48.902 rows=10000 loops=1)
 Total runtime: 67.477 ms
(2 rows)

Ad 3). Casting fields in the where-clause is usually a bad idea. Unless you have indexes on those fields in which they are cast to time AND the query planner chooses to use those indexes, the type-cast will get applied to every candidate record each. If you have a million candidate records, that's 2x a million casts taking place (for two fields).
To say more about this we would need more details about what types those fields are and why you're casting them to time.

> The Postgres 9.6.1 machine average workload return at about 2.0/3.0!
> 
> The problem is not the single query, but the massive queries!
> 
> Thank you again and excuse me for my bad explanation!
> 
> /F

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general