Problem with index in OR'd expression

postgresql.org@xxxxxxxxx · Fri, 22 Dec 2006 14:22:46 -0600

Hello,

I've been using PostgreSQL for a few years and mostly love it.  Aside 
from a few (perceived, anyway) annoying limitations in PL/PGSQL (which I 
almost exclusively am using for db interaction), I'm very satisfied with it.

I ran across this problem several months back and decided to blow it off 
back then, but now I'm really wanting to understand why it's happening, 
if there's a work around, if I'm doing something wrong, and if it's just 
a shortcoming / bug if the developers are aware of it and might 
implement a fix some day.

I've already done a fair amount of testing (a total of hours, I'm sure) 
and Googling around on it.  I also read this FAQ on indexes:

  http://www.postgresql.org/docs/faqs.FAQ.html#item4.6

I think my issue is not explained by any of the resources I reviewed.

To simplify my actual problem, I conducted a very simple test.  First I 
defined a table:

create table t (
  c1 int primary key,
  c2 int
)

with the single index (primary key).  Then I filled the table with 
100,000 rows using a quick pl/pgsql function (c1 = 1, 2, 3... 100,000).

Then I ran these two tests (with EXPLAIN in pgAdmin):

  select * from t where c1 = 75000;

  select * from t where ((0 is null) OR (c1 = 75000));

The first one properly uses the index on c1, the second does not. 
Obviously, a human looking at the second one would realize it's 
essentially identical to the first and properly evaluate the (0 is null) 
part once (since it's immutable) and then ignore it for the rest of the 
searching.

Now, I'm sure some of you might ask "why the hell are you doing that in 
the first place?"  I have a good reason.  I write a lot of pl/pgsql 
functions that are search functions with a list of *optional* 
parameters.  I don't know ahead of time whether a user will include on 
or not.  In MSSQL, what I'm able to do (with no obvious index problems 
that I've seen) is add those all to the WHERE clause like this:

  ((vC1 IS NULL) OR (C1 = vC1)) AND
  ((vC2 IS NULL) OR (C2 = vC2)) ...

(here vC1 and vC2 represent variables passed into the pl/pgsql function).

So my question were basically asked at the beginning of this post: is 
there another way to get the optimizer to understand what I'm trying to 
do here?  Is this a known problem?  Is it working as preferred and 
unlikely to change any time soon?  Is there some setting I can hit 
somewhere to make it work like I want?

The only solution that I've come up with so far is making all of my 
statements that otherwise wouldn't have to be dynamic and then only 
including criteria for ones that I really need (based on the tested 
nullness of the variables), but I find that highly annoying and have run 
into other problems as a result as well.

I'd appreciate any suggestions you might have to help resolve this.

Thank,

John Lawler