Re: Slow query with planner row strange estimation

phb07 <phb07@xxxxxxxxxxxx> · Mon, 12 Jul 2010 22:33:17 +0200

Dimitri a écrit :
It's probably one of the cases when having HINTS in PostgreSQL may be
very helpful..

SELECT /*+ enable_nestloop=off */ ... FROM ...

will just fix this query without impacting other queries and without
adding any additional instructions into the application code..

So, why there is a such resistance to implement hints withing SQL
queries in PG?..

Rgds,
-Dimitri

+1.
Another typical case when it would be helpful is with setting the 
cursor_tuple_fraction GUC variable for a specific statement, without 
being obliged to issue 2 SET statements, one before the SELECT and the 
other after.

On 7/9/10, Robert Haas <robertmhaas@xxxxxxxxx> wrote:

On Fri, Jul 9, 2010 at 6:13 AM, damien hostin <damien.hostin@xxxxxxxxx>
wrote:

Have you tried running ANALYZE on the production server?

You might also want to try ALTER TABLE ... SET STATISTICS to a large
value on some of the join columns involved in the query.

Hello,

Before comparing the test case on the two machines, I run analyse on the
whole and look at pg_stats table to see if change occurs for the columns.
but on the production server the stats never became as good as on the
desktop computer. I set statistic at 10000 on column used by the join, run
analyse which take a 3000000 row sample then look at the stats. The stats
are not as good as on the desktop. Row number is nearly the same but only
1
or 2 values are found.

The data are not balanced the same way on the two computer :
- Desktop is 12000 rows with 6000 implicated in the query (50%),
- "Production" (actually a dev/test server) is 6 million rows with 6000
implicated in the query (0,1%).
Columns used in the query are nullable, and in the 5994000 other rows that
are not implicated in the query these columns are null.

I don't know if the statistic target is a % or a number of value to
obtain,

It's a number of values to obtain.

but event set at max (10000), it didn't managed to collect good stats (for
this particular query).

I think there's a cutoff where it won't collect values unless they
occur significantly more often than the average frequency.  I wonder
if that might be biting you here: without the actual values in the MCV
table, the join selectivity estimates probably aren't too good.

As I don't know what more to do, my conclusion is that the data need to be
better balanced to allow the analyse gather better stats. But if there is
a
way to improve the stats/query with this ugly balanced data, I'm open to
it
!

I hope that in real production, data will never be loaded this way. If
this
appened we will maybe set enable_nestloop to off, but I don't think it's a
good solution, other query have a chance to get slower.

Yeah, that usually works out poorly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Regards.
Philippe Beaudoin.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance