Re: Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables

Vitalii Tymchyshyn <tivv00@xxxxxxxxx> · Tue, 30 Apr 2013 22:24:43 +0300

What I can say is that hibernate has "exists" in both HQL and criteria API (e.g. see http://www.cereslogic.com/pages/2008/09/22/hibernate-criteria-subqueries-exists/ for criteria). So, may be it's easier for you to tune your hibernate query to use exists

2013/4/30 Mark Hampton <mark@xxxxxxxxxxxxx>

I have a Hibernate-generated query (That's not going to change, so let's just focus on the Postgres side for now) like this:
SELECT *
from PERSON p

where p.PERSON_ID in (
                       select distinct p2.PERSON_ID
                       from PERSON p2
                            left outer join PERSON_ALIAS pa on

                              p2.PERSON_ID = pa.PERSON_ID
                       where (lower(p1.SURNAME) = 'duck' or
                             lower(pa.SURNAME) = 'duck') and

                             (lower(p1.FORENAME) = 'donald' or
                             lower(pa.FORENAME) = 'donald')
      )
order by p.PERSON_ID asc;

There are function-based indexes on PERSON and PERSON_ALIAS as follows:

CREATE INDEX PERSON_FORENAME_LOWER_FBIDX ON PERSON (LOWER(FORENAME) VARCHAR_PATTERN_OPS);

CREATE INDEX PERSON_SURNAME_LOWER_FBIDX ON PERSON (LOWER(SURNAME) VARCHAR_PATTERN_OPS);
CREATE INDEX PERSON_ALIAS_FORENAME_LOWER_FBIDX ON PERSON_ALIAS (LOWER(FORENAME) VARCHAR_PATTERN_OPS);

CREATE INDEX PERSON_ALIAS_SURNAME_LOWER_FBIDX ON PERSON_ALIAS (LOWER(SURNAME) VARCHAR_PATTERN_OPS);

The problem is that the above query doesn't use the indexes.  The "or" clauses across the outer-join seem to be the culprit.  If I rewrite the query as follows, Postgres will use the index:

SELECT *
from PERSON p
where (p.PERSON_ID in (
                        select p2.PERSON_ID
                        from TRAVELER.PERSON p2
                             join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
                               pa.PERSON_ID
                        where lower(p2.SURNAME) = 'duck' and

                              lower(pa.FORENAME) = 'donald'
      ) or
      p.PERSON_ID in (
                       select p2.PERSON_ID
                       from TRAVELER.PERSON p2

                            join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
                              pa.PERSON_ID
                       where lower(pa.SURNAME) = 'duck' and

                             lower(p2.FORENAME) = 'donald'
      ) or
      p.PERSON_ID in (
                       select p2.PERSON_ID
                       from TRAVELER.PERSON p2
                       where lower(p2.SURNAME) = 'duck' and

                             lower(p2.FORENAME) = 'donald'
      ) or
      p.PERSON_ID in (
                       select p2.PERSON_ID
                       from TRAVELER.OTHER_NAME pa

                       where lower(pa.SURNAME) = 'duck' and
                             lower(pa.FORENAME) = 'donald'
      ))
order by p.PERSON_ID asc;

So my question is this: Is there a way to get the Postgres optimizer "rewrite" the query execution plan to use the equivalent, but much more efficient latter form?

And before you ask; yes, there are better ways of writing this query.  But we're dealing with Java developers and Hibernate here.  It's a legacy system, and the policy is to avoid hand-written SQL, so for the moment let's not go down that rabbit hole, and focus on the issue of what the optimizer can and cannot do.

-- 
Best regards,
 Vitalii Tymchyshyn