Re: Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables

Mark Hampton <mark@xxxxxxxxxxxxx> · Tue, 30 Apr 2013 16:03:21 -0400

It's an interesting idea, however when I rewrote the original query to use "WHERE EXISTS" rather than "WHERE IN", I get the same bad execution plan.  I think this really has to do with the Postgres optimizer's limitations with respect to outer joins.  

In my case it's certainly possible to rewrite the query by hand to eliminate the outer join and get the same results.

And after posting the original problem, I have also found that with some work it's possible to make Hibernate generate a query that eliminates the outer join and get the same results.

But I think improving the Postgres optimizer to handle such cases would be a nice improvement.  Then again, having lived through many years of Oracle optimizer bugs, it might be easier said than done.  

On Tue, Apr 30, 2013 at 3:24 PM, Vitalii Tymchyshyn <tivv00@xxxxxxxxx> wrote:

What I can say is that hibernate has "exists" in both HQL and criteria API (e.g. see http://www.cereslogic.com/pages/2008/09/22/hibernate-criteria-subqueries-exists/ for criteria). So, may be it's easier for you to tune your hibernate query to use exists

2013/4/30 Mark Hampton <mark@xxxxxxxxxxxxx>

I have a Hibernate-generated query (That's not going to change, so let's just focus on the Postgres side for now) like this:
SELECT *
from PERSON p

where p.PERSON_ID in (
                       select distinct p2.PERSON_ID
                       from PERSON p2
                            left outer join PERSON_ALIAS pa on

                              p2.PERSON_ID = pa.PERSON_ID
                       where (lower(p1.SURNAME) = 'duck' or
                             lower(pa.SURNAME) = 'duck') and

                             (lower(p1.FORENAME) = 'donald' or
                             lower(pa.FORENAME) = 'donald')
      )
order by p.PERSON_ID asc;

There are function-based indexes on PERSON and PERSON_ALIAS as follows:

CREATE INDEX PERSON_FORENAME_LOWER_FBIDX ON PERSON (LOWER(FORENAME) VARCHAR_PATTERN_OPS);

CREATE INDEX PERSON_SURNAME_LOWER_FBIDX ON PERSON (LOWER(SURNAME) VARCHAR_PATTERN_OPS);
CREATE INDEX PERSON_ALIAS_FORENAME_LOWER_FBIDX ON PERSON_ALIAS (LOWER(FORENAME) VARCHAR_PATTERN_OPS);

CREATE INDEX PERSON_ALIAS_SURNAME_LOWER_FBIDX ON PERSON_ALIAS (LOWER(SURNAME) VARCHAR_PATTERN_OPS);

The problem is that the above query doesn't use the indexes.  The "or" clauses across the outer-join seem to be the culprit.  If I rewrite the query as follows, Postgres will use the index:

SELECT *
from PERSON p
where (p.PERSON_ID in (
                        select p2.PERSON_ID
                        from TRAVELER.PERSON p2
                             join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
                               pa.PERSON_ID
                        where lower(p2.SURNAME) = 'duck' and

                              lower(pa.FORENAME) = 'donald'
      ) or
      p.PERSON_ID in (
                       select p2.PERSON_ID
                       from TRAVELER.PERSON p2

                            join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
                              pa.PERSON_ID
                       where lower(pa.SURNAME) = 'duck' and

                             lower(p2.FORENAME) = 'donald'
      ) or
      p.PERSON_ID in (
                       select p2.PERSON_ID
                       from TRAVELER.PERSON p2
                       where lower(p2.SURNAME) = 'duck' and

                             lower(p2.FORENAME) = 'donald'
      ) or
      p.PERSON_ID in (
                       select p2.PERSON_ID
                       from TRAVELER.OTHER_NAME pa

                       where lower(pa.SURNAME) = 'duck' and
                             lower(pa.FORENAME) = 'donald'
      ))
order by p.PERSON_ID asc;

So my question is this: Is there a way to get the Postgres optimizer "rewrite" the query execution plan to use the equivalent, but much more efficient latter form?

And before you ask; yes, there are better ways of writing this query.  But we're dealing with Java developers and Hibernate here.  It's a legacy system, and the policy is to avoid hand-written SQL, so for the moment let's not go down that rabbit hole, and focus on the issue of what the optimizer can and cannot do.

-- 
Best regards,
 Vitalii Tymchyshyn