Re: PostgreSQL seems to create inefficient plans in simple conditional joins

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

/*David Rowley*/ wrote on Sun, 31 Jan 2016 04:57:04 +1300:
On 31 January 2016 at 01:30, Hedayat Vatankhah <hedayat.fwd@xxxxxxxxx> wrote:
Personally, I expect both queries below to perform exactly the same:

SELECT
     t1.id, *
FROM
     t1
INNER JOIN
     t2 ON t1.id = t2.id
     where t1.id > -9223372036513411363;

And:

SELECT
     t1.id, *
FROM
     t1
INNER JOIN
     t2 ON t1.id = t2.id
     where t1.id > -9223372036513411363 and t2.id > -9223372036513411363;

Unfortunately, they do not. PostgreSQL creates different plans for these
queries, which results in very poor performance for the first one compared
to the second (What I'm testing against is a DB with around 350 million
rows in t1, and slightly less in t2).

EXPLAIN output:
First query: http://explain.depesz.com/s/uauk
Second query: link: http://explain.depesz.com/s/uQd
Yes, unfortunately you've done about the only thing that you can do,
and that's just include both conditions in the query. Is there some
special reason why you can't just write the t2.id > ... condition in
the query too? or is the query generated dynamically by some software
that you have no control over?
I can, but it would make my application code much more complex. I was hoping to be able to hide the complexity of DB data model in DB itself using views, triggers etc. If I want to add such conditions, the query generator in my application code would be more complex, and certainly the internal structure of DB will be visible to it.

I'm working to re-design a DB which can grow large and slow, as I guess that we can find a more optimal design before trying optimizations like using materialized views and other common optimizations. I've found two completely different approaches for such problems: de-normalizing data, highly normalizing data (6NF) like Anchor Modeling approach. I decided to experiment with something similar to the latter one (not that extreme!) specially since our current design was not that normalized, and it performs poorly. I'm investigating why it should perform so bad with my queries, and this problem was one of the reasons. In such a design, views are used to present the JOIN of many tables as a single table, so that using the model is easy and transparent. But usually a single table doesn't have 10 ID columns (which can change as the model changes) for which you should repeat any conditions to get acceptable results! While it can be done, it is so annoying: the application should know how many tables are joined together, and repeat the condition for all such columns. And the problem become worse when you are going to create a relation between two different IDs of different data, e.g. relating customer info (composed of joining 5 tables) with info about items (s)he bought (composed of joining 3 tables).

Anyway, it seems that this is what I should implement in my application code. I just hope that adding explicit conditions for each joined table will not turn off any other optimizations!

Such an optimization seemed so natural to me that I didn't believe that PostgreSQL doesn't understand that a condition on ID applies to all id columns in a JOINed query, that I simplified my query step by step until I reached the minimum problematic query which is very similar to the one I posted here. It was at this point that I finally realized that maybe PostgreSQL really doesn't understand it, and I was ... shocked!


I'd personally quite like to see improvements in this area, and even
wrote a patch [1] which fixes this problem too. The problem I had when
proposing the fix for this was that I was unable to report details
about how many people are hit by this planner limitation. The patch I
proposed caused a very small impact on planning time for many queries,
and was thought by many not to apply in enough cases for it to be
worth slowing down queries which cannot possibly benefit. Of course I
agree with this, I've no interest in slowing down planning on queries,
but at the same time understand the annoying poor optimisation in this
area.

Although please remember the patch I proposed was merely a first draft
proposal. Not for production use.

[1] http://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@xxxxxxxxxxxxxx#CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@xxxxxxxxxxxxxx

That's great, I might consider experimenting with this too.

Regards,
Hedayat




--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux