Re: Optimization idea

Vlad Arkhipov <arhipov@xxxxxxxxxxxx> · Wed, 28 Apr 2010 18:38:20 +0900

2010/4/28 Robert Haas <robertmhaas@xxxxxxxxx>:

On Mon, Apr 26, 2010 at 5:33 AM, Cédric Villemain
<cedric.villemain.debian@xxxxxxxxx> wrote:

In the first query, the planner doesn't use the information of the 2,3,4.
It just does a : I'll bet I'll have 2 rows in t1 (I think it should
say 3, but it doesn't)
So it divide the estimated number of rows in the t2 table by 5
(different values) and multiply by 2 (rows) : 40040.

I think it's doing something more complicated.  See scalararraysel().

Thank you for driving me to the right function, Robert.
It is in fact more complicated :)

In the second query the planner use a different behavior : it did
expand the value of t1.t to t2.t for each join relation and find a
costless plan. (than the one using seqscan on t2)

I think the problem here is one we've discussed before: if the query
planner knows that something is true of x (like, say, x =
ANY('{2,3,4}')) and it also knows that x = y, it doesn't infer that
the same thing holds of y (i.e. y = ANY('{2,3,4}') unless the thing
that is known to be true of x is that x is equal to some constant.
Tom doesn't think it would be worth the additional CPU time that it
would take to make these sorts of deductions.  I'm not sure I believe
that, but I haven't tried to write the code, either.

If I understand correctly, I did have some issues with
exclusion_constraint= ON for complex queries in datamining where the
planner failled to understand it must use only one partition because
the where clause where not enough 'explicit'. But it's long time ago
and I don't have my use case.

We probably need to find some real case where the planner optimisation
make sense. But I don't want usual queries to see their CPU time
increase...
<joke>Do we need real Planner Hints ?</joke>

Even if it will be done it does not solve the original issue. If I
understood you right there is now no any decent way of speeding up the query

select *
from t2
join t1 on t1.t = t2.t
where t1.id = X;

except of the propagating the t1.id value to the table t2 and createing
and index for this column? Then the query will look like

select *
from t2
where t1_id = X;

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance